Skip to content
Effloow
← Back to article
EFFLOOW LAB LAB-RUN ·2026-05-23

Computer-Use Agents in 2026: Local Browser-Control PoC

Evidence notes document the bounded local or source-based checks behind an Effloow article. They are not product endorsements, legal advice, or benchmark claims.

Goal

Validate the smallest safe version of a computer-use agent loop before writing the article:

  1. Observe a real UI.
  2. Choose a UI action.
  3. Execute the action through browser automation.
  4. Verify the resulting state.

This was not a live LLM run and did not use API keys, external accounts, production credentials, or remote websites.

Environment

  • Date: 2026-05-23
  • Machine: local macOS workstation
  • Working directory: /tmp/effloow-computer-use-poc
  • Node.js: v25.9.0
  • npm: 11.12.1
  • Browser: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
  • Package installed in temp sandbox: playwright with npm install --ignore-scripts

Files Created in the Temporary Sandbox

  • /tmp/effloow-computer-use-poc/task.html
  • /tmp/effloow-computer-use-poc/package.json
  • /tmp/effloow-computer-use-poc/agent-loop.mjs

The HTML page contained a small "Invoice tagger" form with:

  • a labeled Vendor input
  • a Create tag button
  • an aria-live output

The Node script launched Chrome headlessly with Playwright, read the page text, filled the input by label, clicked the button by role, and verified the output.

Commands

rm -rf /tmp/effloow-computer-use-poc && mkdir -p /tmp/effloow-computer-use-poc
cd /tmp/effloow-computer-use-poc
npm install --ignore-scripts
node agent-loop.mjs

Relevant Output

added 3 packages, and audited 4 packages in 1s

found 0 vulnerabilities
{
  "status": "passed",
  "actions": [
    "observe_text",
    "fill_by_label",
    "click_by_role",
    "verify_output"
  ],
  "result": "tag:acme-cloud-services",
  "observations": [
    "Invoice tagger\n\nEnter a vendor name and submit the form.\n\nVendor\nCreate tag",
    "tag:acme-cloud-services"
  ]
}

What Worked

  • Playwright could drive a real Chrome UI from a temporary sandbox.
  • Label-based and role-based selectors were enough for this simple task.
  • The loop produced a deterministic success condition: tag:acme-cloud-services.
  • The implementation exercised the same high-level observe-act-verify shape used by computer-use agent harnesses, without granting an LLM access to a real browser session.

What Failed

  • No live LLM was connected, so the PoC did not test model reasoning, visual grounding, prompt injection resistance, or recovery from ambiguous pages.
  • No remote website was used. That avoids credentials and policy risk, but it also means this was not a real-world navigation test.
  • The run used a static local page, so it did not measure robustness against layout changes, dynamic content, auth redirects, CAPTCHAs, or third-party scripts.

Limitations

  • This PoC supports saying "Effloow Lab ran a local sandbox PoC of a browser-control loop."
  • It does not support saying "Effloow tested OpenAI Computer Use," "Effloow tested Anthropic computer use," "Effloow benchmarked browser-use," or "this approach is production-ready."
  • The article should frame Playwright as a safe harness for prototyping and verification, not as a full replacement for model-level computer-use products.

Sources Checked

Read the article

This note supports the public article and records what was actually checked.

Open article →