← Back to article
Open article →
Computer-Use Agents in 2026: Local Browser-Control PoC
Evidence notes document the bounded local or source-based checks behind an Effloow article. They are not product endorsements, legal advice, or benchmark claims.
Goal
Validate the smallest safe version of a computer-use agent loop before writing the article:
- Observe a real UI.
- Choose a UI action.
- Execute the action through browser automation.
- Verify the resulting state.
This was not a live LLM run and did not use API keys, external accounts, production credentials, or remote websites.
Environment
- Date: 2026-05-23
- Machine: local macOS workstation
- Working directory:
/tmp/effloow-computer-use-poc - Node.js:
v25.9.0 - npm:
11.12.1 - Browser:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome - Package installed in temp sandbox:
playwrightwithnpm install --ignore-scripts
Files Created in the Temporary Sandbox
/tmp/effloow-computer-use-poc/task.html/tmp/effloow-computer-use-poc/package.json/tmp/effloow-computer-use-poc/agent-loop.mjs
The HTML page contained a small "Invoice tagger" form with:
- a labeled
Vendorinput - a
Create tagbutton - an
aria-liveoutput
The Node script launched Chrome headlessly with Playwright, read the page text, filled the input by label, clicked the button by role, and verified the output.
Commands
rm -rf /tmp/effloow-computer-use-poc && mkdir -p /tmp/effloow-computer-use-poc
cd /tmp/effloow-computer-use-poc
npm install --ignore-scripts
node agent-loop.mjs
Relevant Output
added 3 packages, and audited 4 packages in 1s
found 0 vulnerabilities
{
"status": "passed",
"actions": [
"observe_text",
"fill_by_label",
"click_by_role",
"verify_output"
],
"result": "tag:acme-cloud-services",
"observations": [
"Invoice tagger\n\nEnter a vendor name and submit the form.\n\nVendor\nCreate tag",
"tag:acme-cloud-services"
]
}
What Worked
- Playwright could drive a real Chrome UI from a temporary sandbox.
- Label-based and role-based selectors were enough for this simple task.
- The loop produced a deterministic success condition:
tag:acme-cloud-services. - The implementation exercised the same high-level observe-act-verify shape used by computer-use agent harnesses, without granting an LLM access to a real browser session.
What Failed
- No live LLM was connected, so the PoC did not test model reasoning, visual grounding, prompt injection resistance, or recovery from ambiguous pages.
- No remote website was used. That avoids credentials and policy risk, but it also means this was not a real-world navigation test.
- The run used a static local page, so it did not measure robustness against layout changes, dynamic content, auth redirects, CAPTCHAs, or third-party scripts.
Limitations
- This PoC supports saying "Effloow Lab ran a local sandbox PoC of a browser-control loop."
- It does not support saying "Effloow tested OpenAI Computer Use," "Effloow tested Anthropic computer use," "Effloow benchmarked browser-use," or "this approach is production-ready."
- The article should frame Playwright as a safe harness for prototyping and verification, not as a full replacement for model-level computer-use products.
Sources Checked
- OpenAI Computer Use guide: https://developers.openai.com/api/docs/guides/tools-computer-use
- Anthropic computer use tool docs: https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool
- Browser-use GitHub repository: https://github.com/browser-use/browser-use
- Skyvern developer docs: https://www.skyvern.com/docs/developers/getting-started/introduction
- Playwright ARIA snapshot docs: https://playwright.dev/docs/aria-snapshots
- OpenAI Operator System Card: https://openai.com/index/operator-system-card/
- Web automation agent social-engineering risk paper: https://arxiv.org/abs/2601.07263
Read the article
This note supports the public article and records what was actually checked.