Skip to content
Effloow
← Back to Articles
ARTICLES ·2026-06-01 ·BY EFFLOOW CONTENT FACTORY

LLM Agent Security Is a Human Problem: 59 Papers, 21 Systems

arXiv:2605.24309 analyzed 59 papers and 21 production systems: the mechanisms academics study most have zero production deployment.
security llm-agents agent-safety paper-poc human-ai-interaction
SHARE
LLM Agent Security Is a Human Problem: 59 Papers, 21 Systems

There's a conversation happening in AI security research, and a different conversation happening in the teams actually deploying agents. A paper from May 2026 put numbers on exactly how far apart those conversations are.

arXiv:2605.24309, "Reframing LLM Agent Security as an Agent-Human Interaction Problem," analyzed 59 academic papers, 21 production agent systems, and 26 security plugins as of April 2026. The core finding: three human-centric security mechanisms dominate real-world deployments, while the categories most heavily studied in academic literature see zero production deployment.

That's not a small gap. It means teams researching how to make agents safer are spending their time on things practitioners aren't using — and practitioners are deploying things that researchers haven't studied closely enough.

What the Analysis Found

The paper looked at how agent security is actually implemented across production systems — tools like Claude, Copilot, Cursor, and similar agentic platforms — and mapped that against what academic literature has focused on over the past few years.

Three mechanisms dominated production:

  • Policy specification: pre-defined rules that determine what actions an agent may or may not take. Adopted by 14 of 21 production systems.
  • Runtime approval: human confirms or rejects individual agent actions as they occur. 15 of 21 systems.
  • Scope configuration: setting boundaries on what resources, services, or data an agent can touch. 16 of 21 systems.

The academic literature has a different focus. The two most heavily studied categories — intent anchoring (mechanisms to keep agents aligned with original user intent through multi-step tasks) and trust labeling (marking agent outputs or sources with provenance signals) — see zero deployment in the systems analyzed.

That's worth sitting with. Intent anchoring is a reasonable research priority: keeping agents on track through complex tasks matters. But if no production team is deploying it, either the research hasn't matured into usable patterns, or it's addressing a risk that deployment experience suggests is lower than expected.

The Approval Fatigue Problem

The paper's most actionable finding is about what goes wrong with runtime approval, which is the most commonly deployed mechanism.

Runtime approval sounds like a strong control: the human sees what the agent is about to do and says yes or no. The problem is interaction design, not intent. Users confronted with repeated confirmation dialogs become fatigued, and fatigue degrades decision quality in a specific way: approval rates drift upward regardless of action risk.

The mechanism is cognitive. Each dialog requires attention, evaluation, and a decision. Over a session with many tool calls, users have already processed dozens of similar requests. The path of least resistance is to approve and move on. Low-risk actions (reading files, searching the web) get approved correctly and also train users to click approve. When a genuinely risky action appears later in the session, the user's approval threshold has already shifted.

This isn't a bug in user behavior. It's the expected outcome when a mechanism designed for deliberate decision-making is used at high frequency.

PoC: Modeling Approval Fatigue

Effloow Lab ran a Python stdlib simulation to model this effect across a 20-tool session. The full evidence note is at data/lab-runs/llm-agent-security-human-interaction-approval-fatigue-poc-2026.md. No API calls, no user study — this is an illustrative model, not a reproduction of the paper's experiments.

The model treats approval probability as a function of action risk and prior dialog count. Each previous dialog reduces attention by 4%, floored at 25%. The approval probability for any action shifts toward 1.0 proportionally with fatigue, meaning the higher the baseline risk, the more room there is for fatigue to distort the decision.

RISK_BASE_APPROVE = {"low": 0.99, "medium": 0.85, "high": 0.60, "critical": 0.35}

def fatigue_factor(prior_dialogs: int) -> float:
    return max(0.25, 1.0 - prior_dialogs * 0.04)

def approve_with_fatigue(risk, dialog_index):
    base = RISK_BASE_APPROVE[risk]
    fatigue = fatigue_factor(dialog_index)
    # Fatigued probability pushed toward 1.0
    return base + (1.0 - base) * (1.0 - fatigue)

Session: 20 tool calls across four risk levels (low/medium/high/critical), mixed to reflect a realistic agentic task.

Scenario A — Runtime approval for every call:

[ 1] read_file            low      base=0.99  fatigued=0.99
[ 2] web_search           medium   base=0.85  fatigued=0.86
[ 3] list_directory       low      base=0.99  fatigued=0.99
[ 4] read_file            low      base=0.99  fatigued=0.99
[ 5] write_file           high     base=0.60  fatigued=0.66
[ 6] web_search           medium   base=0.85  fatigued=0.88
[ 7] read_file            low      base=0.99  fatigued=0.99
⚠[ 8] execute_command     high     base=0.60  fatigued=0.71
[ 9] web_search           medium   base=0.85  fatigued=0.90
[10] list_directory       low      base=0.99  fatigued=0.99
⚠[11] send_email          critical base=0.35  fatigued=0.61
[12] read_file            low      base=0.99  fatigued=0.99
⚠[13] database_query      high     base=0.60  fatigued=0.79
[14] web_search           medium   base=0.85  fatigued=0.93
⚠[15] write_file          high     base=0.60  fatigued=0.82
[16] list_directory       low      base=0.99  fatigued=1.00
⚠[17] delete_file         high     base=0.60  fatigued=0.86
⚠[18] send_email          critical base=0.35  fatigued=0.79
[19] read_file            low      base=0.99  fatigued=1.00
⚠[20] execute_command     high     base=0.60  fatigued=0.90

Dialogs shown: 20 / 20
Avg approval rate (high/critical): 0.77  (baseline: 0.50)
High/critical approvals exceeding 70% threshold: 6 events

Scenario B — Policy specification (pre-approve low/medium):

     read_file            low      [PRE-APPROVED — no dialog]
     web_search           medium   [PRE-APPROVED — no dialog]
     list_directory       low      [PRE-APPROVED — no dialog]
     read_file            low      [PRE-APPROVED — no dialog]
[ 1] write_file           high     base=0.60  fatigued=0.60
     web_search           medium   [PRE-APPROVED — no dialog]
     read_file            low      [PRE-APPROVED — no dialog]
[ 2] execute_command      high     base=0.60  fatigued=0.62
     web_search           medium   [PRE-APPROVED — no dialog]
     list_directory       low      [PRE-APPROVED — no dialog]
[ 3] send_email           critical base=0.35  fatigued=0.40
     read_file            low      [PRE-APPROVED — no dialog]
[ 4] database_query       high     base=0.60  fatigued=0.65
     web_search           medium   [PRE-APPROVED — no dialog]
[ 5] write_file           high     base=0.60  fatigued=0.66
     list_directory       low      [PRE-APPROVED — no dialog]
[ 6] delete_file          high     base=0.60  fatigued=0.68
[ 7] send_email           critical base=0.35  fatigued=0.51
     read_file            low      [PRE-APPROVED — no dialog]
⚠[ 8] execute_command     high     base=0.60  fatigued=0.71

Dialogs shown: 8 / 20 (60% reduction)
Avg approval rate (high/critical): 0.60  (baseline: 0.50)
High/critical approvals exceeding 70% threshold: 1 event

The simulation is not a replication of the paper's user study. It demonstrates the structural logic: concentrating approval dialogs on genuinely risky actions preserves the user's deliberate attention for those decisions. Spreading them across low-risk operations trains users to click approve.

The Static Scope Problem

The second identified failure mode is scope configuration. Most systems let users configure scope at session start: "this agent can access my email and calendar but not my files." That configuration is static. It doesn't adapt as the task evolves.

A user who starts a session asking an agent to "summarize my inbox" might, mid-session, want the agent to draft a reply. If drafting wasn't in the initial scope, the agent either refuses (interrupting the task) or the user has to re-configure scope (a friction that most systems make difficult). In practice, users tend to configure scope broadly at the start to avoid these interruptions — which defeats the purpose of scope control.

The paper notes that scope boundaries "configured once at session start fail to adapt to evolving tasks." The ideal is dynamic scope that expands and contracts as the task changes, with the user providing consent at decision points rather than in advance.

Policy Language Accessibility

The third problem is simpler: policy languages for specifying agent behavior are not accessible to non-experts. Writing rules that correctly capture what you do and don't want an agent to do requires understanding the agent's capability surface, which most end users don't have.

The paper positions this as an opportunity for LLM assistance in policy authoring: the LLM can draft policies based on user goals, and the user reviews and approves the policy rather than writing it. The paper reports that an LLM can predict user approval decisions with 85.1% accuracy from context — suggesting that models already have enough understanding of user intent to generate reasonable policy drafts.

This is a specific, actionable claim that's worth tracking as agent platforms mature. If policy authoring can be automated at that accuracy level, the accessibility problem becomes tractable.

What This Means for Teams Building Agent Systems

The research-practice gap has direct implications if you're building agents that interact with real users.

Audit your dialog frequency. Count how many approval prompts a user sees in a typical session. If it's more than 5-7, you're in approval fatigue territory. Most users won't tell you they're clicking through; they'll just approve riskier actions without realizing it.

Invest in policy specification over runtime approval. Runtime approval is the easiest pattern to implement, which is probably why 15 of 21 systems use it. Policy specification requires more upfront design — what are the action categories, what are the risk levels, what requires human review — but it's what keeps approval dialogs rare enough to be meaningful.

Don't build scope as a one-time configuration. If your agent's scope is set at session start, you've guaranteed it will either be too narrow (causing interruptions) or too broad (defeating the point). Scope should be surfaceable and adjustable mid-session, with the agent requesting expansion when it needs to act outside current bounds.

Watch for intent anchoring and trust labeling tooling. These categories exist in research for good reason even if they haven't reached production yet. As agents take on more autonomous multi-step tasks, mechanisms for keeping them aligned with original intent across session turns will become necessary. The gap between research and production deployment suggests an opportunity for teams willing to invest here early.

What the Paper Doesn't Address

The analysis covers how security is implemented, not whether it works. A production system using policy specification is still only as good as the policies written. The paper notes that policy languages are inaccessible but doesn't evaluate the quality of policies practitioners actually write.

The 85.1% approval prediction accuracy is an interesting finding, but its source — whether it comes from a user study, a controlled experiment, or model-based evaluation — isn't specified in the summary sources available. That figure deserves scrutiny before being treated as a design target.

The 21 production systems analyzed are a useful sample but not a census. Agent platforms vary enormously in how much security infrastructure is user-visible versus handled at the API layer. The analysis may undercount mechanisms that are present but not surfaced to users.

Summary

arXiv:2605.24309 provides the clearest empirical picture yet of how LLM agent security is actually deployed. The core finding is that the three dominant production mechanisms — policy specification, runtime approval, and scope configuration — are deployed because they're implementable with existing UX patterns, not because they're the most theoretically sound approaches. The mechanisms academics study most intensively have zero production deployment.

Approval fatigue is the most tractable problem of the three. The fix — reduce dialog frequency through policy specification — is achievable with existing tools and known design patterns. The static scope problem and the policy language accessibility problem are harder, but the paper's hypothesis that LLM assistance in policy authoring could help on both fronts is a reasonable direction.

Effloow Lab's simplified PoC confirmed that the structural logic of approval fatigue is coherent under the assumed cognitive model: high-frequency dialogs push approval rates for risky actions well above their risk-adjusted baselines, while policy-based pre-approval concentrates user attention where it matters.


Paper: arXiv:2605.24309. Submitted May 2026. Available at arxiv.org/abs/2605.24309.

Lab note: data/lab-runs/llm-agent-security-human-interaction-approval-fatigue-poc-2026.md

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →

More in Articles

Stay in the loop.

One dispatch every Friday. New articles, tool releases, and a short note from the editor.

Get weekly AI tool reviews & automation tips

Join our newsletter. No spam, unsubscribe anytime.