Multi-Agent LLM Topology Diagnostics with Successor Representation
When you wire up a multi-agent LLM pipeline, you make topology choices that ripple through every inference run. Chain agents for sequential refinement? Star with a central orchestrator? Fully connected mesh for peer review? The typical approach is to try each layout and benchmark empirically — which costs inference time and money before you learn anything.
A paper posted to arXiv on May 12, 2026 asks a sharper question: can you predict which topology will fail — and how — purely from the graph's structure, before you run a single LLM call?
The answer, according to Ethan Parks and Dalal Alharthi at the University of Arizona, is yes. Their approach borrows a tool from reinforcement learning theory — the successor representation — and applies it to the communication graph of a multi-agent system. The result is three spectral numbers that serve as cheap pre-inference diagnostics. Effloow Lab reproduced the core calculation in pure NumPy to verify what the numbers actually tell you.
Why Topology Matters More Than You Think
Before getting into the math, it helps to understand the concrete stakes. Research on multi-agent LLM systems shows performance can vary by up to 10% across different fixed topologies on standard benchmarks like MMLU and GSM8K — with no single topology winning across all tasks.
The failure modes are structural:
Chain topology (agent 1 → agent 2 → agent 3 → ...) suffers from monotone drift. Each agent's small bias compounds forward without any averaging. By the end of the chain, errors introduced early have propagated through every subsequent step without correction.
Star topology (hub agent ↔ all spoke agents) creates a bottleneck. The hub paraphrases, summarizes, or reformulates messages passing through it. Over multiple rounds, spoke agents increasingly receive the hub's interpretation of the original content — not the original content itself.
Mesh topology (all-to-all) sounds ideal but introduces a different failure mode: consensus inertia. When all agents can see each other's outputs, conformity pressure can suppress minority views and collapse diverse reasoning into premature agreement.
Each failure mode maps to a different structural property of the communication graph. That's the insight arXiv 2605.11453 formalizes.
The Successor Representation Framework
The successor representation was introduced by Dayan (1993) as a tool for generalization in reinforcement learning. For a given policy, it answers: "Starting from state i, how often do I expect to visit state j in the future?" Formally, it caches expected discounted future state occupancy.
Parks and Alharthi adapt this idea to multi-agent communication graphs. Instead of a state transition matrix, they use the row-stochastic communication operator P — the matrix that describes how agents weight messages from their neighbors. Each row of P sums to 1, representing the fractional attention each agent gives to each other agent's output.
The successor representation is then:
M = (I - γP)^{-1}
where γ ∈ (0, 1) is a discount factor (the paper uses γ = 0.9) and I is the identity matrix. The entry M[i,j] represents the discounted cumulative "influence" that agent j's messages will have on agent i across all future communication rounds.
This is computable in milliseconds for any communication graph. What makes it useful is that three spectral quantities extracted from M predict three distinct failure modes.
Three Spectral Quantities, Three Failure Modes
Condition number κ(M) is the ratio of the largest to smallest singular value of M. A high condition number means the system amplifies small perturbations in inputs dramatically. The paper's central claim: κ(M) is a perfect rank-order predictor of empirical perturbation robustness — the topology with the highest κ breaks first when agent outputs contain noise or errors.
Spectral gap Δ(M) is the difference between the two largest eigenvalues of M. A large spectral gap means the matrix has one dominant direction and the remaining directions damp out quickly — which translates to fast convergence toward consensus across the agent network. Small spectral gap means slow, sluggish consensus.
Spectral radius ρ(M) is the magnitude of the largest eigenvalue. The paper connects this to cumulative error amplification — the maximum factor by which errors can grow as they propagate through the network across many communication rounds.
Effloow Lab PoC: Computing the Spectral Diagnostics
Effloow Lab ran a minimal reproduction in Python to verify the numerical predictions. No LLM calls — just NumPy and the closed-form construction of each topology's communication matrix.
Building the Communication Matrices
import numpy as np
gamma = 0.9
n = 6 # 6-agent network
# Chain: bidirectional sequential (boundary agents have 1 neighbor, interior have 2)
def make_chain(n):
P = np.zeros((n, n))
for i in range(n):
nbrs = [j for j in [i-1, i+1] if 0 <= j < n]
for j in nbrs:
P[i, j] = 1.0 / len(nbrs)
return P
# Star: hub (agent 0) distributes to spokes, spokes report only to hub
def make_star(n):
P = np.zeros((n, n))
for j in range(1, n):
P[0, j] = 1.0 / (n - 1) # hub reads all spokes equally
for i in range(1, n):
P[i, 0] = 1.0 # spokes read only the hub
return P
# Mesh: fully connected uniform (including self-weight)
def make_mesh(n):
return np.ones((n, n)) / n
Computing M and Extracting Spectral Metrics
def compute_metrics(P, gamma=0.9):
I = np.eye(len(P))
M = np.linalg.inv(I - gamma * P)
# Condition number from singular value decomposition
sv = np.linalg.svd(M, compute_uv=False)
kappa = sv[0] / sv[-1]
# Spectral gap from eigenvalues
eigs = np.sort(np.real(np.linalg.eigvals(M)))[::-1]
gap = eigs[0] - eigs[1]
# Spectral radius
rho = np.max(np.abs(np.linalg.eigvals(M)))
return {"condition_number": kappa, "spectral_gap": gap, "spectral_radius": rho}
Results for n=5 and n=6 Agents
Running this produces the following diagnostic numbers:
| Topology | κ(M) Condition Number | Δ(M) Spectral Gap | ρ(M) Spectral Radius | Robustness Prediction |
|---|---|---|---|---|
| Mesh | 10.00 | 9.00 | 10.00 | Most robust — lowest κ |
| Chain | 20.21 | 7.25 | 10.00 | Moderate — slow consensus |
| Star | 28.61 | 9.00 | 10.00 | Least robust — highest κ |
Several things stand out from these numbers.
Mesh's condition number is exactly 1/(1-γ) = 10. This is not a coincidence. In a fully-connected uniform mesh, all agents have identical influence paths, so M has a single dominant eigenvalue and all other eigenvalues equal. The minimal spread between singular values means M is as well-conditioned as this class of matrix can be.
Star's condition number is nearly 3x higher than Mesh. The hub creates a structural bottleneck: all influence flows through a single node, creating extreme singular value spread. Even small perturbations at the hub get amplified throughout the network.
Chain's spectral gap is notably smaller. At 7.25 versus 9.0 for star and mesh, the chain's two largest eigenvalues are closer together — predicting slower convergence to consensus. This matches the intuition that sequential chains propagate information one hop at a time.
What the PoC Confirmed and Where It Diverged
Effloow Lab also ran an input perturbation test: given a fixed input vector x, add small noise δ and measure how much the output M(x + δ) differs from Mx across 500 trials.
def perturbation_test(P, epsilon=0.05, n_trials=500, gamma=0.9):
I = np.eye(len(P))
M = np.linalg.inv(I - gamma * P)
np.random.seed(42)
errors = []
for _ in range(n_trials):
x = np.random.randn(len(P))
x /= np.linalg.norm(x)
delta = np.random.randn(len(P)) * epsilon
relative_error = (
np.linalg.norm(M @ (x + delta) - M @ x) /
np.linalg.norm(M @ x)
)
errors.append(relative_error)
return np.mean(errors)
Results with n=6 agents, ε=0.05:
| Topology | κ(M) | Mean Relative Error |
|---|---|---|
| Star | 32.66 | 0.209 (highest) |
| Mesh | 10.00 | 0.193 |
| Chain | 19.99 | 0.165 (lowest) |
The star topology correctly ranks as least robust — highest condition number, highest error. The mesh correctly ranks as best conditioned. But chain shows lower error than mesh despite having a higher condition number, a rank inversion the paper doesn't predict.
This divergence is informative rather than a failure. Our perturbation model perturbs the input vector x (a linear-algebraic surrogate for message content). The paper validates against actual LLM outputs from Qwen2.5-7B-Instruct over 100 independent trials — where "perturbation" means the natural stochasticity of a language model's generation. The chain's bidirectional averaging at each hop appears to provide implicit noise reduction under certain input conditions, an effect invisible to the purely algebraic surrogate.
The takeaway for practitioners: the spectral framework is a reliable first-pass diagnostic, not an oracle. High κ(M) is a strong warning sign. Low κ(M) is not a guarantee.
Practical Implications for Agent Architects
When to Use Each Topology
The spectral numbers suggest a decision framework:
Use chain when task decomposition is strictly sequential and each step's output is the definitive input for the next. Accept the slow consensus and watch for drift accumulation at long chain lengths. Mitigate by inserting a verification agent every 3-4 steps.
Use star when you need fast convergence (high spectral gap) and can tolerate lower robustness. Star is the right call when the hub is highly reliable — a powerful orchestrator model with temperature=0. The high κ is dangerous only when the hub can fail or hallucinate.
Use mesh when robustness to individual agent errors is the priority. The low condition number means the system degrades gracefully when any single agent produces poor output. The cost is communication overhead (O(n²) message passing) and potential conformity drift at larger n.
Applying This Before Running Inference
The practical workflow is:
# Before running your multi-agent pipeline:
topologies_to_evaluate = {
"chain": make_chain(n),
"star": make_star(n),
"mesh": make_mesh(n),
}
for name, P in topologies_to_evaluate.items():
metrics = compute_metrics(P, gamma=0.9)
print(f"{name}: κ={metrics['condition_number']:.1f}, gap={metrics['spectral_gap']:.2f}")
if metrics['condition_number'] > 25:
print(f" WARNING: High κ — vulnerable to agent output variability")
if metrics['spectral_gap'] < 5:
print(f" WARNING: Low spectral gap — slow consensus, expect more communication rounds")
This runs in under a millisecond. You learn the robustness profile of your topology before committing to inference costs.
Extending to Custom Topologies
The framework generalizes beyond the three canonical topologies. Any communication graph becomes a row-stochastic matrix by normalizing each row. A hierarchical topology (two-level star), a ring with cross-links, or a dynamic topology that changes per round — all can be analyzed with the same M = (I - γP)^{-1} formula.
For heterogeneous agent networks where different agents have different reliability, you can weight the rows of P accordingly — giving less weight to less reliable agents — and see how the spectral metrics shift.
Common Mistakes When Designing Multi-Agent Topologies
Assuming chain is "safe" because it's simple. Simplicity in wiring does not mean simplicity in error propagation. The chain's small spectral gap means errors introduced early compound over the entire sequence length with no averaging correction.
Using mesh for everything. The fully connected topology's low κ is appealing, but quadratic communication cost and conformity pressure make it unsuitable for large n or tasks requiring diverse viewpoints. Reserve mesh for small verification pools (n=3 or n=4).
Choosing topology without considering γ. The discount factor γ represents how much agents weight long-range influence. At γ=0.9, the successor representation looks 10 steps ahead. For short pipelines (2-3 hops), a lower γ like 0.5 changes the spectral profile meaningfully. Match γ to your actual pipeline depth.
Not varying topology by task phase. A hybrid approach — star for initial decomposition, parallel chain for independent subtasks, mesh for final synthesis — often outperforms any single static topology, and the spectral framework helps you reason about each phase separately.
FAQ
Q: Does this require any specific LLM or framework?
No. The spectral analysis is pure linear algebra — it operates on the communication graph structure, not the model. It works equally for LangGraph pipelines, CrewAI, AutoGen, or any custom agent framework. You need only to express your agent communication pattern as a row-stochastic matrix.
Q: How do I choose γ?
The paper uses γ = 0.9, which is a common default in successor representation literature. Practically, γ should reflect how many communication rounds you expect in your pipeline. For a 3-step pipeline, γ ≈ 0.5 is reasonable. For a 10+ step pipeline, γ ≈ 0.9 gives the long-range view the paper uses. Sensitivity analysis — running compute_metrics across γ ∈ {0.5, 0.7, 0.9} — takes milliseconds and shows how robust your topology ranking is to this choice.
Q: Are these metrics stable as n (number of agents) grows?
For mesh, κ(M) = 1/(1-γ) regardless of n — it's always exactly 10 at γ=0.9. For chain and star, condition numbers grow with n as boundary effects and hub bottlenecks become more pronounced. This means the relative advantage of mesh over star widens as you scale from 5 to 20 agents.
Q: What if my topology is dynamic — changing each round?
The paper analyzes static topologies, so this is an open area. For dynamic topologies, you can compute the spectral metrics for each candidate configuration and pick the one with the best diagnostic profile for the current task phase. The computational cost remains negligible.
Key Takeaways
The successor representation framework from arXiv 2605.11453 provides a principled, zero-cost way to characterize multi-agent LLM topologies before running inference. Three numbers — condition number κ(M), spectral gap Δ(M), and spectral radius ρ(M) — map directly onto three failure modes: perturbation fragility, slow consensus, and cumulative error amplification.
Effloow Lab's NumPy reproduction confirms the core numerical predictions: mesh topology is the most well-conditioned (κ=10), star is the most fragile (κ≈29-33), and chain shows the slowest consensus convergence (smallest spectral gap). The "perfect rank-order predictor" claim for condition number holds for the extreme cases (star as worst, mesh as best) but can invert for middle-ranked topologies depending on perturbation model — a nuance worth keeping in mind when applying the framework.
For developers architecting multi-agent systems, this framework adds a useful pre-flight check to the toolkit: before your first inference call, spend a millisecond computing κ(M) and ask whether your topology's robustness profile matches your reliability requirements.
The successor representation diagnostic is a legitimate pre-inference tool for multi-agent LLM architecture decisions. The spectral analysis takes under a millisecond, requires only NumPy, and correctly identifies the star topology as structurally fragile and mesh as the most robust. Not a substitute for empirical benchmarking, but a cheap first filter that prevents obviously poor topology choices before they waste inference budget.
Need content like this
for your blog?
We run AI-powered technical blogs. Start with a free 3-article pilot.