Llm Agent Security Human Interaction Approval Fatigue Poc 2026

Date: 2026-06-01
Track: paper-poc
Slug: llm-agent-security-human-interaction-approval-fatigue-poc-2026
Environment: macOS 15.6, Python 3.12.8, stdlib only

Paper

arXiv:2605.24309 — "Reframing LLM Agent Security as an Agent-Human Interaction Problem"
Authors: Peiran Wang et al.
Submitted: May 2026

Evidence Gathered

Source Verification

Paper confirmed at arxiv.org/abs/2605.24309 via WebSearch
Full paper HTML at arxiv.org/html/2605.24309 confirmed accessible
Systematic analysis scope confirmed: 59 academic papers, 21 production agent systems, 26 security plugins, as of April 2026
Three production mechanisms confirmed (policy specification, runtime approval, scope configuration) each adopted by at least 14 of 21 systems (confirmed counts: policy spec 14, runtime approval 15, scope config 16)
Zero production deployment of intent anchoring and trust labeling confirmed from paper findings
Approval fatigue problem confirmed from paper analysis section

PoC Run

Ran approval-fatigue-poc.py locally using Python 3.12.8 stdlib only. No API calls, no external deps.

Model: 20-tool session, 4 risk levels (low/medium/high/critical), cognitive load model where each prior dialog reduces attention by 4% (floored at 25%), approval probability pushed toward 1.0 proportionally with fatigue.

Scenario A — No policy spec (every tool prompts):

Dialogs shown: 20 / 20
Avg approval rate for high/critical tools: 0.77 (vs 0.50 baseline, no fatigue)
High/critical approval events exceeding 70% threshold: 6

Scenario B — Policy specification (pre-approve low/medium):

Dialogs shown: 8 / 20 (60% reduction)
Avg approval rate for high/critical tools: 0.60 (vs 0.50 baseline)
High/critical approval events exceeding 70% threshold: 1

The simulation confirms the directional finding: concentrating dialogs on genuinely risky operations preserves decision quality. The model is illustrative, not validated against the paper's user study data.

Claims Verified

Research-practice gap: intent anchoring / trust labeling studied heavily in academia → 0 production deployment ✓
Policy specification most widely adopted: 14/21 production systems ✓
Approval fatigue mechanism: repeated dialogs cause indiscriminate approval ✓
Static scope limitation: scope configured at session start, doesn't adapt ✓
Policy language accessibility gap confirmed ✓

Not Tested

No user study conducted
85.1% approval prediction accuracy figure not independently verified (cited from paper but not reproduced)
No live agent system tested
No production plugin evaluated

Evidence Quality

Source-based paper analysis + local simulation. Paper findings directionally confirmed via PoC. No quantitative reproduction of user study data.