← Back to article
Open article →
LangGraph 1.2 Fault Tolerance Sandbox PoC
Evidence notes document the bounded local or source-based checks behind an Effloow article. They are not product endorsements, legal advice, or benchmark claims.
Goal
Verify the current LangGraph fault-tolerance APIs that matter for production agent workflows:
langgraph==1.2.0installability.TimeoutPolicyandNodeTimeoutErrorimport path.- Per-node timeout recovery through
error_handler=. - Documented limitation that sync nodes cannot safely use
timeout=.
No production credentials, model API keys, database credentials, or external LLM calls were used.
Commands Run
rm -rf /tmp/effloow-langgraph-timeout-poc
mkdir -p /tmp/effloow-langgraph-timeout-poc
python3 -m venv /tmp/effloow-langgraph-timeout-poc/.venv
/tmp/effloow-langgraph-timeout-poc/.venv/bin/python -m pip install --upgrade pip
/tmp/effloow-langgraph-timeout-poc/.venv/bin/python -m pip install 'langgraph==1.2.0'
Relevant install output:
Successfully installed ... langchain-core-1.4.0 langgraph-1.2.0 langgraph-checkpoint-4.1.0 langgraph-prebuilt-1.1.0 langgraph-sdk-0.3.14 ...
Version/import check:
/tmp/effloow-langgraph-timeout-poc/.venv/bin/python - <<'PY'
import sys, importlib.metadata
print('python', sys.version.split()[0])
print('langgraph', importlib.metadata.version('langgraph'))
print('langgraph-checkpoint', importlib.metadata.version('langgraph-checkpoint'))
print('langchain-core', importlib.metadata.version('langchain-core'))
from langgraph.types import RetryPolicy, TimeoutPolicy, Command
from langgraph.errors import NodeTimeoutError, NodeError, GraphDrained
from langgraph.graph import StateGraph, START, END
print('imports ok', RetryPolicy.__name__, TimeoutPolicy.__name__, Command.__name__, NodeTimeoutError.__name__, NodeError.__name__, GraphDrained.__name__)
PY
Output:
python 3.12.8
langgraph 1.2.0
langgraph-checkpoint 4.1.0
langchain-core 1.4.0
imports ok RetryPolicy TimeoutPolicy Command NodeTimeoutError NodeError GraphDrained
Timeout Recovery PoC
The graph below defines an async node that sleeps longer than its run_timeout. The node has a retry policy and an error handler. The handler receives NodeError, writes a recovery status, and routes to finalize.
import asyncio, time
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.types import RetryPolicy, TimeoutPolicy, Command
from langgraph.errors import NodeError, NodeTimeoutError
class State(TypedDict, total=False):
attempts: int
status: str
error: str
elapsed: float
async def slow_vendor_api(state: State) -> State:
await asyncio.sleep(0.2)
return {"status": "vendor_finished"}
async def finalize(state: State) -> State:
return state
def timeout_handler(state: State, error: NodeError) -> Command:
return Command(
update={
"status": "recovered_by_error_handler",
"error": type(error.error).__name__,
"attempts": state.get("attempts", 0) + 1,
},
goto="finalize",
)
builder = StateGraph(State)
builder.add_node(
"slow_vendor_api",
slow_vendor_api,
timeout=TimeoutPolicy(run_timeout=0.05),
retry_policy=RetryPolicy(max_attempts=1, retry_on=NodeTimeoutError),
error_handler=timeout_handler,
)
builder.add_node("finalize", finalize)
builder.add_edge(START, "slow_vendor_api")
builder.add_edge("finalize", END)
graph = builder.compile()
async def main():
started = time.perf_counter()
result = await graph.ainvoke({"attempts": 0})
result["elapsed"] = round(time.perf_counter() - started, 3)
print(result)
asyncio.run(main())
Output:
{'attempts': 1, 'status': 'recovered_by_error_handler', 'error': 'NodeTimeoutError', 'elapsed': 0.055}
Sync Node Limitation Check
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.types import TimeoutPolicy
class State(TypedDict, total=False):
status: str
def sync_node(state: State) -> State:
return {"status": "sync"}
builder = StateGraph(State)
builder.add_node("sync_node", sync_node, timeout=TimeoutPolicy(run_timeout=0.05))
builder.add_edge(START, "sync_node")
builder.add_edge("sync_node", END)
builder.compile()
Output:
ValueError: Node timeouts are only supported for async nodes because sync Python execution cannot be safely cancelled in-process. Node 'sync_node' is sync.
What Worked
langgraph==1.2.0installed successfully in a clean temporary virtual environment.TimeoutPolicy,RetryPolicy,Command,NodeError,NodeTimeoutError, andGraphDrainedwere available from the documented import paths.- A timed-out async node produced
NodeTimeoutErrorand reached the configured recovery handler. error_handler=returned aCommandthat updated state and routed the graph to a final node.- A sync node with
timeout=failed at compile time with a clearValueError, matching the documentation's async-only limitation.
What Failed or Was Not Tested
- No LLM provider call was made. The slow node used
asyncio.sleep()to simulate an external API stall. - Graceful shutdown was import-checked but not fully exercised with an OS-level
SIGTERMprocess supervisor. DeltaChannelwas researched from official LangChain release/blog material but not benchmarked locally.- Type-safe streaming v3 was researched from the GitHub release notes but not reproduced in the sandbox.
Sources Checked
- GitHub releases:
https://github.com/langchain-ai/langgraph/releases - PyPI package page:
https://pypi.org/project/langgraph/ - Fault tolerance docs:
https://docs.langchain.com/oss/python/langgraph/fault-tolerance - Durable execution docs:
https://docs.langchain.com/oss/python/langgraph/durable-execution - Delta Channels blog:
https://www.langchain.com/blog/delta-channels-evolving-agent-runtime - LangChain/LangGraph 1.0 milestone blog:
https://www.langchain.com/blog/langchain-langgraph-1dot0
Read the article
This note supports the public article and records what was actually checked.