Computer-Use Agents in 2026: Local Browser-Control PoC

Goal

Validate the smallest safe version of a computer-use agent loop before writing the article:

Observe a real UI.
Choose a UI action.
Execute the action through browser automation.
Verify the resulting state.

This was not a live LLM run and did not use API keys, external accounts, production credentials, or remote websites.

Environment

Date: 2026-05-23
Machine: local macOS workstation
Working directory: /tmp/effloow-computer-use-poc
Node.js: v25.9.0
npm: 11.12.1
Browser: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Package installed in temp sandbox: playwright with npm install --ignore-scripts

Files Created in the Temporary Sandbox

/tmp/effloow-computer-use-poc/task.html
/tmp/effloow-computer-use-poc/package.json
/tmp/effloow-computer-use-poc/agent-loop.mjs

The HTML page contained a small "Invoice tagger" form with:

a labeled Vendor input
a Create tag button
an aria-live output

The Node script launched Chrome headlessly with Playwright, read the page text, filled the input by label, clicked the button by role, and verified the output.

Commands

rm -rf /tmp/effloow-computer-use-poc && mkdir -p /tmp/effloow-computer-use-poc
cd /tmp/effloow-computer-use-poc
npm install --ignore-scripts
node agent-loop.mjs

Relevant Output

added 3 packages, and audited 4 packages in 1s

found 0 vulnerabilities

{
  "status": "passed",
  "actions": [
    "observe_text",
    "fill_by_label",
    "click_by_role",
    "verify_output"
  ],
  "result": "tag:acme-cloud-services",
  "observations": [
    "Invoice tagger\n\nEnter a vendor name and submit the form.\n\nVendor\nCreate tag",
    "tag:acme-cloud-services"
  ]
}

What Worked

Playwright could drive a real Chrome UI from a temporary sandbox.
Label-based and role-based selectors were enough for this simple task.
The loop produced a deterministic success condition: tag:acme-cloud-services.
The implementation exercised the same high-level observe-act-verify shape used by computer-use agent harnesses, without granting an LLM access to a real browser session.

What Failed

No live LLM was connected, so the PoC did not test model reasoning, visual grounding, prompt injection resistance, or recovery from ambiguous pages.
No remote website was used. That avoids credentials and policy risk, but it also means this was not a real-world navigation test.
The run used a static local page, so it did not measure robustness against layout changes, dynamic content, auth redirects, CAPTCHAs, or third-party scripts.

Limitations

This PoC supports saying "Effloow Lab ran a local sandbox PoC of a browser-control loop."
It does not support saying "Effloow tested OpenAI Computer Use," "Effloow tested Anthropic computer use," "Effloow benchmarked browser-use," or "this approach is production-ready."
The article should frame Playwright as a safe harness for prototyping and verification, not as a full replacement for model-level computer-use products.

Sources Checked

OpenAI Computer Use guide: https://developers.openai.com/api/docs/guides/tools-computer-use
Anthropic computer use tool docs: https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool
Browser-use GitHub repository: https://github.com/browser-use/browser-use
Skyvern developer docs: https://www.skyvern.com/docs/developers/getting-started/introduction
Playwright ARIA snapshot docs: https://playwright.dev/docs/aria-snapshots
OpenAI Operator System Card: https://openai.com/index/operator-system-card/
Web automation agent social-engineering risk paper: https://arxiv.org/abs/2601.07263