Agentic Engineering Beyond Vibe Coding Methodology 2026

Purpose

This PoC tested the smallest useful version of an "agentic engineering" control loop: a proposed code change is only accepted after deterministic verification, and the verification trace is saved as an auditable artifact.

The sandbox did not call an AI model. It simulated a good baseline and a bad agent patch so the article can discuss the engineering boundary without pretending that Effloow ran a production agent workflow.

Commands

rm -rf /tmp/effloow-agentic-engineering-poc
mkdir -p /tmp/effloow-agentic-engineering-poc
cd /tmp/effloow-agentic-engineering-poc

Created:

package.json
calculator.mjs
test-runner.mjs
gate.mjs

Baseline verification:

npm run gate

Output:

> gate
> node gate.mjs

GATE_PASS deterministic verification accepted release

Injected a deliberately bad patch:

perl -0pi -e 's/Math\.round\(cents - \(cents \* percent \/ 100\)\)/cents - percent/' calculator.mjs
npm run gate || true

Output:

> gate
> node gate.mjs

GATE_FAIL deterministic verification blocked release

Trace captured by gate.mjs:

[
  {
    "label": "unit-test-gate",
    "cmd": "node test-runner.mjs",
    "status": 1,
    "stdout": "PASS zero percent",
    "stderr": "FAIL ten percent: expected 900, got 990\nFAIL round half up: expected 874, got 986.5"
  }
]

Restored the correct implementation:

perl -0pi -e 's/cents - percent/Math.round(cents - (cents * percent \/ 100))/' calculator.mjs
npm run gate

Output:

> gate
> node gate.mjs

GATE_PASS deterministic verification accepted release

What Worked

A minimal deterministic gate separated a plausible-looking patch from an acceptable patch.
The gate produced a machine-readable trace.json artifact containing command, status, stdout, and stderr.
The failing patch was blocked before any release step.

What Failed

The trace was overwritten on each gate run instead of being appended with immutable run IDs.
The sandbox used one unit-test gate only. It did not include linting, static analysis, security checks, snapshot tests, or human review.
The bad patch was manually injected rather than generated by a real coding agent.

Limitations

This PoC proves a workflow shape, not a product benchmark. It does not measure agent coding quality, speed, cost, reliability, or defect rate. It also does not prove that a specific commercial coding agent follows this process. The article should only claim that Effloow Lab ran a local sandbox gate demonstrating why professional agentic workflows need deterministic verification and trace capture.

Purpose

Commands

What Worked

What Failed

Limitations

Read the article