Skip to content
Effloow
← Back to articles
EFFLOOW LAB LAB-RUN

Openai Agents Sdk Flagship

Evidence notes document the bounded local or source-based checks behind an Effloow article. They are not product endorsements, legal advice, or benchmark claims.

Proof Studio Evidence Note: OpenAI Agents SDK Tool-Failure Recovery

  • Report: /articles/openai-agents-sdk-tool-failure-recovery-proof-2026
  • Claim: An OpenAI Agents SDK agent on gpt-5.5-2026-04-23 completes a three-step tool workflow and recovers from an injected transient tool failure without human intervention.
  • Claim scope: agent-reliability (bound to the exact model above — no inference to other models)
  • Test date: 2026-06-13 (UTC)
  • Environment: openai-agents 0.17.5, openai 2.41.1, Python 3.12, max_turns=10, default tool-error handling
  • Evidence level: sandbox-executed

Run record

8 runs, 2 scenarios. Every run saved exactly the ground-truth values (valid_count: 5, invalid_count: 3, total_amount: 994.49).

Run Scenario Tool sequence Correct Latency Tokens
001 clean read → save yes 7.80s 1,076
002 clean read → save yes 7.11s 1,076
003 clean read → save yes 5.83s 1,076
004 clean read → save yes 4.39s 1,077
005 transient-failure read → save → save yes (recovered) 5.83s 1,632
006 transient-failure read → save → save yes (recovered) 5.96s 1,636
007 transient-failure read → save → save yes (recovered) 6.12s 1,632
008 transient-failure read → save → save yes (recovered) 5.82s 1,633

Totals: 10,837 tokens (10,156 input / 681 output). Failure-scenario runs cost a mean 1,633 tokens vs 1,076 clean — a 52% overhead per recovered tool failure.

Artifact manifest (SHA-256)

Each run is an append-only JSON artifact recorded at execution time. Hashes below are from the manifest ledger; failed runs would remain in this record.

Artifact SHA-256 (first 16) Recorded at
run-001.json abf0a4ab72107501… 2026-06-13T00:01:48Z
run-002.json a19bed90a2418b3e… 2026-06-13T00:01:55Z
run-003.json c381e559db6b62cf… 2026-06-13T00:02:01Z
run-004.json 222b8d1b2890adee… 2026-06-13T00:02:05Z
run-005.json 23bcf3fa7ebc27eb… 2026-06-13T00:02:11Z
run-006.json a3379cf8a275744a… 2026-06-13T00:02:17Z
run-007.json 4451ca7ace244811… 2026-06-13T00:02:23Z
run-008.json c99b6cf83593e636… 2026-06-13T00:02:29Z

Manifest integrity warnings at time of publication: none.

Limitations

N=8 on a simple two-tool task with one clean retryable failure mode, one model snapshot, default SDK error handling. Counts, not rates — see the full report for the complete limitations list.