OpenAI Codex GPT-5.5: Autonomous Coding Agent Guide 2026
OpenAI Codex has changed a lot since its early API days. The 2026 version is a full autonomous coding agent — powered by GPT-5.5, OpenAI's strongest agentic model — that runs across a CLI, IDE extensions, a macOS app, and a cloud backend that connects directly to your GitHub repositories.
This guide covers what Codex actually is in 2026, how to set it up, how to configure it for your projects, and where it fits into a real development workflow. No fluff about "the future of AI" — just the parts a developer needs to know.
Why GPT-5.5 Changes the Agent Calculus
Most coding agents before 2026 were wrappers around chat models that happened to write code. The agent loop was fragile: models lost context, forgot earlier decisions, and required you to babysit each step.
GPT-5.5 is different in degree, not just kind. According to OpenAI's announcement, it achieves 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro — benchmarks that measure whether a model can complete real engineering tasks end-to-end, not just predict the next token in a code snippet. The gains are strongest in agentic coding, computer use, and long-horizon workflows where the model must reason across context and act over time.
That performance jump is why Codex can now do things like schedule its own wakeup to continue a task across days, operate multiple macOS apps in parallel, and remember project-level conventions without being told again each session.
What Codex Actually Is in 2026
Codex is OpenAI's autonomous coding agent product, distinct from the old Codex API (now deprecated). It runs in four surfaces:
CLI — A terminal tool (npm i -g @openai/codex) that works in your existing project directory. Start with codex in your terminal; it runs in Agent mode by default, meaning it can read files, execute commands, and write changes without you approving each step.
IDE Extensions — Available for VS Code, Cursor, and Windsurf. Codex operates as an embedded agent within your editor rather than a simple autocomplete.
macOS / Windows App — A desktop application with access to computer use. On macOS, multiple Codex agents can operate other apps (Figma, Xcode, Slack, browsers) in parallel with your own work.
Codex Cloud — Background task execution against GitHub repositories. Accessible at chatgpt.com/codex. Codex clones your repo into a sandboxed environment, runs your test suite, and opens pull requests — you don't touch the keyboard between prompt and PR.
| Surface | Best For | Computer Use | GitHub Integration |
|---|---|---|---|
| CLI | Local development, scripts | No | Via git CLI |
| IDE Extension | Inline editing, refactors | No | Via IDE |
| macOS App | Multi-app workflows | Yes (macOS only) | Via app |
| Codex Cloud | Background tasks, PR automation | No | Native (@codex in comments) |
Installing and Configuring the CLI
The CLI is the lowest-friction way to start. It ships as a thin npm wrapper around a platform-specific binary (Apache-2.0 license, Node.js 16+ required).
npm install -g @openai/codex
On first run, codex prompts you to sign in with your ChatGPT account or provide an API key. After authentication:
codex # Enter Agent mode in current directory
codex "fix the failing tests" # Single-shot task, then exit
codex --model gpt-5.4-mini # Use the faster/cheaper model
Model selection matters for your workflow:
gpt-5.5— Default for most tasks. Strongest reasoning, slowest, most expensive.gpt-5.4-mini— Faster and cheaper for lighter coding tasks or when Codex spawns subagents for parallelism.gpt-5.3-codex-spark— Research preview, available to ChatGPT Pro subscribers only. Experimental features.
You can also point the CLI at any provider that supports the Chat Completions or Responses API — useful for running against Azure OpenAI or for self-hosted setups in regulated environments.
Configuring Projects with AGENTS.md
AGENTS.md is the configuration file Codex reads before touching your code. It tells the agent how your project works: which commands to run, what to avoid, how tests are structured, which conventions to follow.
Without an AGENTS.md, Codex makes reasonable guesses. With one, you stop wasting tokens on corrections every session.
File discovery hierarchy:
- Global:
~/.codex/AGENTS.md(your personal defaults across all projects) - Project root:
.codex/AGENTS.mdorAGENTS.mdin the repository root - Working directory: override files deeper in the tree
Codex reads the most specific applicable file. If an AGENTS.override.md exists at any level, it takes precedence over AGENTS.md at that same level. The combined size limit is 32KiB — large enough for thorough instructions, small enough to stay in context.
A minimal project-level AGENTS.md looks like this:
# Project: my-api
## Stack
- Node.js 22, TypeScript 5.8, Fastify 5
- PostgreSQL 16 via Drizzle ORM
- Tests: Vitest (unit), Playwright (e2e)
## Commands
- Run tests: `npm test`
- Lint: `npm run lint`
- Build: `npm run build`
## Conventions
- Use kebab-case for file names
- All database queries go through `src/db/queries/` — never inline SQL
- Never modify `src/generated/` — these files are auto-generated
## What to Avoid
- Do not run migrations automatically
- Do not touch .env files
Worth noting: as of 2026, the AGENTS.md format has been adopted as an open standard by the Agentic AI Foundation (a Linux Foundation directed fund). It is now supported across Cursor, Gemini CLI, Windsurf, GitHub Copilot, and dozens of other tools — which means the time you invest in writing your AGENTS.md pays off across the entire agent ecosystem, not just Codex.
Memory: Persistent Context Across Sessions
One of the more practically useful additions in 2026 is Codex's memory system. After you enable it, Codex remembers stable context between sessions: personal preferences, recurring workflows, tech stacks, project conventions, and known pitfalls.
This means you stop re-explaining your setup. A session where you corrected Codex on your import alias convention can inform every session afterward — without you copying that instruction into every prompt.
Memory has some design guardrails worth knowing:
- Secrets are redacted. Codex scans what it's about to remember and strips API keys, tokens, and credentials before writing to memory storage.
- Active sessions are skipped. Memory only updates when a session closes cleanly, not mid-task.
- Background updates. Memory writes happen asynchronously at the end of a thread, not immediately, to avoid blocking the agent mid-task.
Memory complements (but doesn't replace) AGENTS.md. Think of AGENTS.md as your project's onboarding doc — checked into git, shared with your team — and memory as your personal set of preferences that follow you across projects and repos.
Work Scheduling: Long-Horizon Tasks
Codex can now schedule future work for itself and wake up automatically to continue a long-term task, potentially across days or weeks. From a developer's perspective, this means you can assign an open-ended investigation or a multi-phase refactor and let Codex resume autonomously when it has more context — after a test run completes, after a build finishes, or at a scheduled time.
This is meaningfully different from a background task that runs once. Scheduling turns Codex from a tool you invoke into something closer to an asynchronous collaborator with its own work queue.
Scheduling is paired with Codex's proactive assistance feature, where Codex uses memory and project context to suggest how to start your work day — essentially proposing next steps based on where the last session ended.
Computer Use on macOS
Computer use is currently macOS-only and is not available in the EU, UK, or Switzerland at launch. With it enabled, Codex can operate any app on your Mac by seeing your screen, moving a cursor, clicking, and typing — while you continue working in other windows.
Practical uses:
- Running a visual regression test in a browser and filing a GitHub issue with a screenshot
- Updating a Figma component to match a design change, then committing the code counterpart
- Running a build in Xcode and capturing the error log to attach to a fix PR
Multiple Codex agents can run computer use sessions in parallel, isolated from each other and from your active work. This is still an early capability — the feature works best on well-defined visual tasks with clear success criteria.
GitHub Integration and Codex Cloud
For teams, the Codex Cloud + GitHub integration is where the most significant workflow shift happens. The patterns:
@codex in PR comments. Tag @codex in a pull request comment to start a cloud task with the PR as context. Ask it to review, fix a specific issue, or propose an alternative implementation. It posts back to the PR thread.
Automatic code review. You can configure Codex to review every incoming PR against your repository's guidelines, posting a standard GitHub code review focused on serious issues rather than style nits.
GitHub Action. The openai/codex-action GitHub Action lets you trigger Codex tasks inside your CI/CD pipeline — useful for automated fix attempts on failing tests or for generating changelog entries from commit diffs.
# .github/workflows/codex-fix.yml
name: Codex Auto-Fix
on:
pull_request:
types: [opened, synchronize]
jobs:
codex-review:
runs-on: ubuntu-latest
steps:
- uses: openai/codex-action@v1
with:
task: "Review this PR for breaking API changes"
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
For background cloud tasks, Codex clones your repo into a sandboxed environment, runs your test suite, iterates on failures, and opens a pull request. You provide the task description; Codex handles the entire implementation cycle.
Pricing in 2026
Codex is included in ChatGPT's paid tiers. OpenAI moved to token-based pricing in April 2026 (previously per-message), which benefits heavy users with concise prompts.
| Plan | Monthly Cost | Codex Access | Notes |
|---|---|---|---|
| ChatGPT Plus | $20/mo | Standard | Baseline usage |
| ChatGPT Pro (mid) | $100/mo | 2x–10x Plus | Intro offer through May 2026 |
| ChatGPT Pro | $200/mo | 20x Plus | Highest limits, recommended for daily Codex use |
| Business / Enterprise | Custom | Custom | SSO, compliance, admin controls |
Pricing details can change — check openai.com/codex and the help center rate card for current numbers.
Common Mistakes Developers Make with Codex
Skipping AGENTS.md. Without project context, Codex guesses at your conventions. The first few sessions often produce correct code that violates your team's patterns. A 20-line AGENTS.md fixes this.
Treating it like a fast autocomplete. Codex's value is in delegating complete tasks — "fix all the TypeScript errors in this module", "add pagination to this API endpoint". Prompting it to write single functions ignores the agent's actual strengths.
Not running tests first. Before assigning Codex a bug fix, make sure your test suite runs cleanly. Codex iterates on test failures — if your tests were already broken, it will optimize for passing tests rather than fixing the bug.
Using GPT-5.5 for everything. For short, well-defined tasks — format this file, rename this variable — gpt-5.4-mini is faster and cheaper. Save gpt-5.5 for complex multi-file tasks, architectural changes, or anything where reasoning quality matters.
Ignoring memory. After a few sessions, review what Codex has remembered. It may have stored something imprecise. You can correct memories directly in the Codex settings, and those corrections propagate to all future sessions.
FAQ
Q: Is Codex the same as GitHub Copilot?
No. GitHub Copilot is an IDE-embedded autocomplete and inline chat tool. Codex is an autonomous agent that takes a task, executes a full implementation cycle (reading files, writing code, running tests), and surfaces results. They have overlapping use cases in IDE editing but are architecturally different.
Q: Can I use Codex without a ChatGPT subscription?
Codex Cloud requires a ChatGPT paid plan. The CLI can be pointed at the OpenAI API directly using an API key, which is billed per token at standard API rates. Enterprise teams often prefer API key access for compliance and cost control reasons.
Q: Does computer use work on Linux?
No — computer use is currently macOS-only and is not available in the EU, UK, or Switzerland at launch. Linux support and international rollout have been announced as planned but not yet available as of May 2026.
Q: How is AGENTS.md different from a system prompt?
A system prompt is per-session and you manage it yourself. AGENTS.md is a project-level file that Codex loads automatically, shared via version control, and composable across global and project scopes. Multiple AGENTS.md files at different directory levels merge rather than override.
Q: Can I run Codex on my own infrastructure?
The CLI supports any provider that implements the Chat Completions or Responses API, so you can point it at Azure OpenAI or a self-hosted endpoint. The Codex Cloud product (background tasks, PR integration) is OpenAI-hosted only.
Key Takeaways
GPT-5.5 Codex in 2026 is less about code generation and more about autonomous task execution — the shift is from "write this function" to "fix this bug", "implement this feature", and "review this PR". The agent handles the implementation loop; your job is to define the task and review the result.
For individual developers, the fastest path to value is: install the CLI, write a 20-line AGENTS.md, and assign one complete task rather than asking for snippets. For teams, the Codex Cloud + GitHub integration moves code review and bug fixes into the PR workflow without adding another tool to your stack.
The AGENTS.md format becoming an open standard is worth noting: configurations you write today will work across Codex, Cursor, Gemini CLI, and the broader agent toolchain. That portability makes the investment more durable than betting on any single tool.
OpenAI Codex 2026 is a capable autonomous coding agent with a well-designed CLI and strong GitHub integration. GPT-5.5's agentic performance is a genuine improvement over earlier models, and the AGENTS.md configuration system makes project setup practical rather than aspirational. Computer use and scheduling are early-stage features worth watching but not yet production-ready for most teams.
Need content like this
for your blog?
We run AI-powered technical blogs. Start with a free 3-article pilot.