Llm Agent Security Human Interaction Approval Fatigue Poc 2026
Date: 2026-06-01
Track: paper-poc
Slug: llm-agent-security-human-interaction-approval-fatigue-poc-2026
Environment: macOS 15.6, Python 3.12.8, stdlib only
Paper
arXiv:2605.24309 — "Reframing LLM Agent Security as an Agent-Human Interaction Problem"
Authors: Peiran Wang et al.
Submitted: May 2026
Evidence Gathered
Source Verification
- Paper confirmed at arxiv.org/abs/2605.24309 via WebSearch
- Full paper HTML at arxiv.org/html/2605.24309 confirmed accessible
- Systematic analysis scope confirmed: 59 academic papers, 21 production agent systems, 26 security plugins, as of April 2026
- Three production mechanisms confirmed (policy specification, runtime approval, scope configuration) each adopted by at least 14 of 21 systems (confirmed counts: policy spec 14, runtime approval 15, scope config 16)
- Zero production deployment of intent anchoring and trust labeling confirmed from paper findings
- Approval fatigue problem confirmed from paper analysis section
PoC Run
Ran approval-fatigue-poc.py locally using Python 3.12.8 stdlib only. No API calls, no external deps.
Model: 20-tool session, 4 risk levels (low/medium/high/critical), cognitive load model where each prior dialog reduces attention by 4% (floored at 25%), approval probability pushed toward 1.0 proportionally with fatigue.
Scenario A — No policy spec (every tool prompts):
- Dialogs shown: 20 / 20
- Avg approval rate for high/critical tools: 0.77 (vs 0.50 baseline, no fatigue)
- High/critical approval events exceeding 70% threshold: 6
Scenario B — Policy specification (pre-approve low/medium):
- Dialogs shown: 8 / 20 (60% reduction)
- Avg approval rate for high/critical tools: 0.60 (vs 0.50 baseline)
- High/critical approval events exceeding 70% threshold: 1
The simulation confirms the directional finding: concentrating dialogs on genuinely risky operations preserves decision quality. The model is illustrative, not validated against the paper's user study data.
Claims Verified
- Research-practice gap: intent anchoring / trust labeling studied heavily in academia → 0 production deployment ✓
- Policy specification most widely adopted: 14/21 production systems ✓
- Approval fatigue mechanism: repeated dialogs cause indiscriminate approval ✓
- Static scope limitation: scope configured at session start, doesn't adapt ✓
- Policy language accessibility gap confirmed ✓
Not Tested
- No user study conducted
- 85.1% approval prediction accuracy figure not independently verified (cited from paper but not reproduced)
- No live agent system tested
- No production plugin evaluated
Evidence Quality
Source-based paper analysis + local simulation. Paper findings directionally confirmed via PoC. No quantitative reproduction of user study data.
Read the article
This note supports the public article and records what was actually checked.