Cursor 3: The Agents Window, Fleet Management, and the IDE’s Last Stand

📋 Disclosure: NivaaLabs publishes independent AI tool reviews based on research and analysis. Some links on this site may be affiliate links — if you click and purchase, we may earn a small commission at no extra cost to you. This never influences our editorial recommendations. Read our full disclosure →

Cursor 3 Review: The Agents Window, Fleet Management, and the IDE’s Last Stand

🗞️ Coverage: Cursor 3.0 (April 2) → 3.1 (April 15) → 3.2 (April 24, 2026). This review covers the full Cursor 3 release cycle including the latest /multitask update, based on official changelogs, Futurum Research analysis, InfoQ, and independent developer testing.

🎯 Quick Verdict

Cursor 3 is the most architecturally significant update to any coding tool in 2026. The Agents Window isn’t an IDE upgrade — it’s a completely new surface, built from scratch, for an entirely different way of working. Agent usage inside Cursor grew 15x in one year. Tab autocomplete users were outnumbered by agent users for the first time in early 2026. The latest 3.2 release pushes further with /multitask and cross-repo fleet management. At $20/month Pro, it remains the most capable developer-facing agent orchestration platform available — but the cost and complexity questions are real, and Claude Code’s terminal-native simplicity is a genuine alternative.

Launch Dates 3.0: Apr 2 · 3.1: Apr 15 · 3.2: Apr 24, 2026
Headline Feature Agents Window — parallel agent fleets across local, cloud, SSH, and worktrees
Price Hobby: Free · Pro: $20/mo · Business: $40/user/mo · Enterprise: custom
Company Valuation ~$50B (reported, April 2026) · $2B ARR · Turned down OpenAI acquisition offer

On February 26, 2026, Cursor CEO Michael Truell published a blog post called “The Third Era of AI Software Development.” The argument was simple and sweeping. Era one: tab autocomplete — developers type, models predict. Era two: synchronous agents — developers describe a task, a single agent executes while they watch. Era three (arriving now): autonomous agent fleets — developers define the problem, spin up multiple agents, and review artifacts when the work is done.

The data Cursor shared to support the claim is striking. Agent usage in Cursor grew 15x in a single year. In March 2025, there were roughly 2.5 Tab users for every agent user. By early 2026, that ratio had inverted: 2 agent users for every Tab user. Truell’s conclusion: “The Tab era lasted nearly two years. The synchronous agent era may not last one.”

Cursor 3, launched April 2, 2026, is the product built around that conclusion. The Agents Window is not an update to the existing IDE. It is a completely new surface — the first interface Anysphere has built from scratch rather than forking from VS Code — designed for a developer whose primary job is no longer writing code but managing the agents that write it. Cursor 3.1 followed on April 15 with interactive canvases and security review agents. Cursor 3.2 shipped on April 24 with /multitask and cross-repo fleet management. Each release has been a layer added to the same platform strategy, not a feature drip.

At the same time, Claude Opus 4.7 cleared 70% of tasks on CursorBench, OpenAI Codex launched parallel background computer use, and the coding tool market entered its most competitive period in history. This review covers everything that shipped, how it actually works, and who should use it.

📈 Cursor’s Agent Usage Inversion — Tab vs Agent Users (2025–2026)

The Third Era Thesis — And the Platform Built Around It

To understand Cursor 3, you need to read the March 2026 release sequence as a single document rather than a series of product updates. Five releases in 28 days, each one a layer in a platform stack:

March 5 — Automations: Event-driven agents. Define a trigger (Slack message, GitHub event, Linear ticket, PagerDuty alert, cron schedule) and a set of instructions. When the trigger fires, Cursor spins up a cloud sandbox, the agent executes, and it keeps memory of past runs to improve over time. This is the trigger layer of the stack.

March 11 — Marketplace: 30+ plugins from Atlassian, Datadog, GitLab, Glean, Hugging Face, monday.com, PlanetScale, and more. By April 2026, the Cursor Marketplace has grown substantially. This is the tools layer.

March 17 — Composer 2: Anysphere’s own in-house coding model, built specifically for agentic tasks. Beats Claude Opus 4.6 on Terminal-Bench 2.0 (61.7 vs. 58.0). Optimized for multi-step planning, tool use, and long-horizon execution. This is the model layer.

March 24 — Self-Hosted Agents: Cloud agent infrastructure — agents that keep running after your laptop shuts, across environments, with persistent context. This is the runtime layer.

April 2 — Cursor 3 (Agents Window): The interface that sits on top of all four previous layers, built from scratch rather than extended from VS Code. This is the management layer — the cockpit from which developers orchestrate everything built in March.

Read that way, Cursor 3 wasn’t a surprise. It was the completion of a deliberate architecture: triggers → tools → model → runtime → interface. Five interlocking layers that together constitute a platform, not a feature set.

💡 The Architectural Bet: Cursor 3’s Agents Window is the first interface Anysphere built from scratch — not forked from VS Code. That’s not a cosmetic choice. As one analysis put it: “VS Code was designed for humans editing files. Agent fleets need a different primitive: multi-workspace coordination, cloud/local handoff, artifact-based review.” The decision to start from zero signals that Anysphere believes VS Code’s fundamental model is insufficient for the era it’s building for.

The Agents Window — What It Actually Is

Access it with Cmd+Shift+P → Agents Window. You can run it alongside the IDE, or use it as the primary interface — the IDE is accessible anytime but no longer the default starting point.

The Agents Window is inherently multi-workspace. Unlike the IDE, which operates on a single repo at a time, the Agents Window lets you manage agents working across multiple repositories simultaneously from one view. You can have one agent working on your frontend repo, another on your backend, and a third on shared libraries — all visible in a single panel, without switching contexts or retargeting between repos.

Agents in the window can run in four environments: locally (same machine, same filesystem), in worktrees (isolated git branches that don’t touch your working directory), in the cloud (Cursor’s cloud sandbox infrastructure, persists after laptop close), and over remote SSH (connecting to and running agents on remote development environments). The seamless handoff between these environments is a genuine technical achievement — you can start an agent locally, hand it off to a cloud sandbox when you close your laptop, and pick it up the next morning without losing context.

Agent Tabs let you view multiple agent chats simultaneously, side-by-side or in a grid. Instead of flipping between conversations one at a time, you see them all. The review experience for parallel work — comparing outputs, picking the best approach — becomes significantly more tractable when you don’t have to context-switch between full-screen conversations.

Developer at multi-monitor setup managing parallel AI agent workflows — representing Cursor 3's Agents Window
Cursor 3’s premise: your job is no longer to write code. It’s to manage the agents that write it — and the Agents Window is the cockpit for that job. Source: Pexels

Fleet Management: Parallel, Background, Cross-Repo

The fleet management capability is what separates Cursor 3 from every prior version and from most of its competitors. Here’s what running a fleet actually looks like in practice.

Worktree-Based Parallel Execution

The /worktree command creates an isolated git worktree — a separate working copy of your repository on a new branch — where an agent can make changes without touching your current work. You can run as many worktrees as you want simultaneously. Each agent operates in isolation: its changes don’t conflict with yours or with each other until you choose to merge. When an agent finishes its task in a worktree, you move that branch to your local foreground with a single click, test it, and merge if it looks good.

The practical workflow: you’re working on a feature in your main branch. You spin up three agents — one fixing a bug report from Linear, one updating documentation, one addressing security review feedback — each in its own worktree. All three run in the background while you continue your own work. When any of them finishes, you get a notification, click to bring that branch forward, review the diff, and merge or reject. This is materially different from the sequential agent workflow that defined Era Two — and it’s the core capability that the developer productivity data points to as the most significant workflow change of 2026.

Best-of-N Model Comparison

Here’s a feature that sounds gimmicky until you use it: run the same prompt across multiple models in parallel using separate worktrees, then compare the results side by side. Ask Claude, GPT-5.4, and Composer 2 all to implement the same feature — each in its own branch, simultaneously, with the costs visible — and pick the implementation you want. This turns model selection from a configuration decision (set once, apply everywhere) into a per-task empirical judgment. The model that wins for authentication flows might not be the model that wins for data transformation tasks. Best-of-N makes that optimization practical rather than theoretical.

Cloud-Local Agent Handoff

The handoff between local and cloud execution solves one of the most annoying problems in agentic development: the context that dies when your laptop closes. Cursor’s cloud sandbox infrastructure keeps agent sessions running persistently — with context, tool state, and progress intact — regardless of whether your local machine is on. Start a long-running refactor at 5pm, close your laptop, come back the next morning, and the agent has been working through the night. The Await tool added in Cursor 3.1 lets agents wait for background shell commands and subagents to complete before proceeding — solving the coordination problem that causes multi-step agents to fail when steps have variable completion times.

Design Mode — Pointing Instead of Describing

Design Mode is available inside the Agents Window’s integrated browser. Instead of describing a UI problem in text — “the button in the top right corner of the modal, the one that says Submit, its padding looks off on mobile” — you click directly on the element in the rendered page and leave a comment or annotation as the agent’s instruction.

This solves a communication problem that anyone who has tried to describe visual bugs to an AI has encountered. Text descriptions of visual interfaces are lossy. “The third item in the dropdown” is ambiguous when the dropdown has been updated since you wrote the description. Clicking the element directly — while the agent can see exactly what you see — produces a precise, unambiguous instruction. The result is a point-and-fix loop for frontend work that OpenAI Codex’s in-app browser also pursues, though with different interaction models. Both are racing toward the same insight: the bottleneck in frontend iteration isn’t model capability — it’s the interface between what you see and what you can describe.

Composer 2 — The Model Cursor Built Itself

Cursor’s most strategically significant move in Q1 2026 wasn’t the Agents Window. It was building its own model.

Composer 2 is Anysphere’s in-house coding model, purpose-built for agentic tasks rather than adapted from a general-purpose model. The benchmarks: 61.7% on Terminal-Bench 2.0, beating Claude Opus 4.6 (58.0) on this specific benchmark, though trailing GPT-5.4 (75.1). On CursorBench — Cursor’s own internal benchmark, which better reflects the real-world multi-step tasks agents handle in production — Composer 2 performs competitively against frontier models at a significantly lower inference cost.

The technical innovation most worth understanding is self-summarization. Agentic coding sessions produce long action histories that blow past context windows. Most agents handle this with brute-force compaction — summarize the history with a separate model call, continue. Cursor’s approach is more elegant: Composer 2 is trained with RL rewards that cover the full chain including compression steps. The model learns, during training, what to keep and what to drop when summarizing its own context. The result: compaction errors reduced by 50% compared to external summarization. For long-horizon tasks — the ones that matter most for fleet management — this compounds significantly over hundreds of steps.

Composer 2 doesn’t replace Claude or GPT in Cursor — users can run any model they prefer, and Cursor’s model-agnostic architecture means Claude Opus 4.7 is available for tasks where reasoning depth matters more than cost efficiency. But Composer 2 gives Anysphere something no VS Code extension can offer: a model that’s been optimized specifically for the way Cursor’s agents work.

Cursor Security Review — The Feature Nobody Saw Coming

Launched in Cursor 3.1 (April 15), Security Review is the most surprising feature in the entire Cursor 3 cycle. Two purpose-built security agents embedded directly into the development workflow:

Security Reviewer checks every pull request for security vulnerabilities, authentication regressions, privacy and data-handling risks, agent tool auto-approvals, and prompt injection attacks. It leaves inline comments at the exact diff location with severity rating and remediation suggestion. Not a generic linter warning. Not a separate SAST tool you have to configure and run separately. An agent that reads your PR the same way a security-focused human reviewer would, and leaves comments in the same place.

Vulnerability Scanner runs scheduled scans of your codebase to check for known vulnerabilities, outdated dependencies, and configuration issues. Results push to Slack. You can plug in MCP servers for existing SAST, SCA, and secrets scanners — making the Cursor security agents a coordination layer over your existing tooling rather than a replacement for it.

The timing matters. Security Review arrived the same week that Project Glasswing demonstrated that AI can find and exploit production vulnerabilities autonomously, and the same week that Google researchers warned about indirect prompt injection attacks against enterprise AI agents browsing the web. Cursor’s decision to build security review into the agent workflow — rather than treating it as a separate phase — reflects a genuine understanding that agents writing code at scale creates a security surface that traditional code review doesn’t cover.

Cursor 3.2 — /multitask and the Agent Execution Runtime

Released April 24, Cursor 3.2 is the update that Futurum Research called a “repositioning of Cursor as an agent execution runtime competitive with CI/CD and cloud dev vendors.” That’s a strong claim. The feature that justifies it: /multitask.

Type /multitask in any conversation and Cursor will automatically run async subagents to parallelize your request instead of adding it to the sequential queue. For larger tasks, it breaks them down into smaller chunks and assigns each chunk to a separate subagent running simultaneously. If you already have messages queued, you can ask Cursor to multitask on them rather than waiting for the current run to finish.

Combined with the multi-root workspace feature — where a single agent session targets a reusable workspace spanning multiple repositories — this means a single developer can now delegate a cross-repo refactor, have it broken into component tasks automatically, and have those tasks executed in parallel across frontend, backend, and shared libraries simultaneously. Without retargeting between repos. Without managing the task decomposition manually. The developer describes the end state; the fleet figures out the work.

Futurum’s analysis identifies the competitive implication clearly: “If Cursor’s approach bears fruit, Harness, GitLab, CircleCI, and GitHub Actions need a credible answer for what their offering contributes when a fleet of agents pre-resolves much of the work the pipeline used to mediate.” Cursor 3.2 is not competing with code editors anymore. It is competing with CI/CD infrastructure.

Pricing

Plan Price Key Limits Best For
Hobby Free 2,000 completions/month · 50 slow premium requests Evaluating Cursor, casual use
Pro $20/month Unlimited completions · 500 fast premium requests/month · Cloud agents included Individual developers — the primary target for this review
Business $40/user/month Everything in Pro · Team admin, SSO, centralized billing, privacy mode on by default Dev teams, SMBs
Enterprise Custom Custom usage limits · Advanced security · Dedicated support · SAML SSO Enterprise, regulated industries

The cost note that developers on the Pro plan most frequently raise: running multiple parallel agents on frontier models (Claude Opus 4.7, GPT-5.4) consumes fast premium requests significantly faster than single-agent workflows. The 500 fast premium requests per month that come with the $20 Pro plan can be exhausted in a few days of active fleet use. Cursor counts each agent’s interactions separately — five agents running in parallel for one hour generates roughly five hours of API consumption. Teams that want to run serious fleet workloads should budget for either the Business plan or model usage costs on top of the Pro subscription.

The cost-efficiency story improves significantly when Composer 2 is used as the model — Cursor’s own model is optimized for lower inference costs specifically because it was trained for multi-step agentic tasks rather than general capability breadth. For routine agent tasks (dependency updates, documentation, boilerplate code), Composer 2 at lower cost makes the fleet economics much more favorable than frontier model costs might suggest. Use the AI Pricing Calculator to model what parallel agent usage actually costs across different model choices.

Cursor 3 vs Claude Code vs OpenAI Codex — 2026 Shootout

Dimension Cursor 3 Claude Code OpenAI Codex
Interface Purpose-built Agents Window + IDE fallback Terminal CLI — no GUI Desktop app with in-app browser + computer use
Parallel Agents ✅ Unlimited, worktree-isolated, /multitask ✅ Multi-agent coordination (API-level) ✅ Background computer use, parallel agents
Models Available Claude, GPT-5.4, Gemini, Composer 2, Qwen Claude Opus 4.7 only GPT-5.4 primarily
Own Model ✅ Composer 2 (61.7% Terminal-Bench 2.0) ❌ (Anthropic models only) ❌ (OpenAI models only)
Visual UI Feedback ✅ Design Mode (browser annotation) ❌ Terminal-only ✅ In-app browser with page comments
Cross-Repo ✅ Multi-root workspaces (3.2) ⚠️ Manual retargeting ⚠️ Via GitHub integration
Security Review ✅ Built-in PR + scheduled scan agents ❌ Not built-in ⚠️ Via Cyber Verification Program
MCP Support ✅ Full MCP + plugin marketplace ✅ Deepest MCP integration ✅ 111 plugins
Price $20/mo Pro Included with Claude Max ($100/mo) ChatGPT Plus/Pro ($20–$200/mo)
Best For GUI-native developers, fleet orchestration, multi-repo Terminal-native, deepest Anthropic integration, MCP power users Teams already in OpenAI ecosystem, plugin breadth

The honest summary: these three tools have different philosophies, not just different features. Codex is betting on the broadest surface area — code, browse, generate images, automate, remember, plugin ecosystem. Claude Code is betting on the deepest model quality and tightest Anthropic ecosystem integration — no GUI, just the best possible model doing the best possible work from a terminal. Cursor 3 is betting on the management layer — the developer who wants to orchestrate agent fleets across environments with visual tooling and model flexibility, and who values the ability to choose which model handles which task.

The Criticisms Worth Taking Seriously

Cursor 3 divided the developer community more than any prior Cursor release. The criticisms appearing most frequently in the InfoQ coverage, Reddit threads, and developer discussions are substantive enough to address directly.

“More agents doesn’t mean better software.” The most common criticism: parallelizing broken or mediocre agents produces more broken mediocre code, faster. A fleet of agents that each struggle with your codebase’s patterns produces a mess of conflicting approaches in parallel branches, all of which need human review. The response: this is true for Era Two agents on hard problems — and it’s why Composer 2’s self-summarization and long-horizon capabilities matter. The fleet model rewards agent quality exponentially. If your single agent produces acceptable output 60% of the time, your fleet produces consistent chaos. If your single agent produces solid output 85% of the time, your fleet is a genuine multiplier. The question is whether the models have crossed the threshold where fleet management is net positive for your specific codebase.

Vendor lock-in through the ecosystem. One developer on the forum put it directly: “The proper agent command center I would want is the one that manages all AI agents I have, not one that locks me into one vendor.” Cursor’s response (from Lee Robinson, a moderator) is that Cursor supports all model vendors. But the Automations, Marketplace integrations, self-hosted cloud agents, and Security Review agents are all Cursor-native infrastructure. The deeper you go into the platform, the more your workflows are specific to Cursor’s architecture. This is a real consideration for teams evaluating long-term commitment.

Cost overhead at scale. Running frontier model agents in parallel, each consuming context independently, can be expensive fast. The 500 fast premium requests per month on the $20 Pro plan is not generous for serious fleet usage. Teams using Composer 2 for routine tasks and reserving frontier models for complex reasoning can manage the economics — but the pricing requires active management, not passive use.

Governance gaps as agents fan out. Futurum Research’s analysis flags a real issue: “Governance and observability gaps widen as parallel subagents fan out across branches and repositories with a limited enterprise control surface.” Security Reviewer catches vulnerabilities in output — but who reviews what the fleet is accessing, which credentials it’s using, and what it’s writing to disk while running? For enterprise deployments, the Agent 365-style governance layer that Microsoft built for its agent ecosystem doesn’t yet have a direct Cursor equivalent at the team level.

Final Verdict

Cursor 3 is the most architecturally honest product in the coding tool market. While every other tool is debating whether to add an “agent mode” to their existing product, Anysphere built a new surface from scratch and said: this is the interface for the era we’re in, not the era we’re leaving. The Agents Window, fleet management, best-of-N comparison, Design Mode, and Security Review together constitute a coherent vision for what software development looks like when agents write most of the code.

The $50 billion valuation in reported fundraising discussions reflects a market that agrees with Anysphere’s analysis of the trajectory. $2 billion in ARR — for a product that launched less than three years ago — validates that developers are paying for it. Turning down an acquisition offer from OpenAI signals that Anysphere believes the independent platform path is worth more than the exit.

The question every developer has to answer is personal: are you a writer of code or a manager of agents? Cursor 3 is unambiguously built for the latter. If you’re not there yet — if you still spend most of your time writing code directly and use AI to accelerate that process — the IDE is still available, and Cursor 3’s tab completion and single-agent experience remains excellent. But the architectural direction is clear. Anysphere is building for the developer who manages fleets. Everything else is a compatibility mode.

🏗️ Individual Developers on Pro ($20/month)

Upgrade immediately. The Agents Window, worktrees, and best-of-N comparison alone justify the Pro price for any developer running more than one agentic task per day. Start with /worktree on one task while you work on another — the productivity gain from that single workflow change is immediately measurable. Add /multitask when you’re comfortable with the fleet model.

👨‍💻 Terminal-Native Developers

Consider Claude Code instead. If your primary workflow is terminal-based, you rarely deal with frontend UI, and you want the deepest possible integration with the world’s best reasoning model, Claude Code with Opus 4.7 is the stronger choice. Cursor’s GUI advantage becomes a disadvantage for developers who live in the terminal.

🏢 Enterprise Teams

Evaluate carefully, especially governance. The Business and Enterprise plans provide team administration and security controls, but the governance surface for agent fleets is still maturing. If you’re deploying agents across production codebases, pair Cursor 3 with a security review process that doesn’t rely solely on Cursor’s own Security Review agents. The fleet capability is real and significant; the enterprise control layer needs more time.

🔄 Teams Comparing Cursor vs Codex

The choice comes down to ecosystem, not features. If your team is already in the OpenAI ecosystem, using Azure infrastructure, and wants the broadest plugin coverage, Codex is the natural choice. If your team values model flexibility, multi-repo fleet management, visual Design Mode, and an independent platform not tied to any single AI lab, Cursor 3 is the stronger platform.

📚 Related NivaaLabs Coverage:

🧮 What Does Running a Cursor Agent Fleet Actually Cost?

Use our free AI Pricing Calculator to model parallel agent costs across Claude Opus 4.7, GPT-5.4, Composer 2, and Qwen 3.6 — before committing to a fleet workflow.

Try the Free AI Pricing Calculator →

Compare model API costs in real time · Plan your agent fleet budget

❓ Frequently Asked Questions

What is the Cursor 3 Agents Window?

The Agents Window is a completely new interface built from scratch — not forked from VS Code — for running multiple AI agents in parallel across local machines, worktrees, cloud sandboxes, and remote SSH environments. Access it with Cmd+Shift+P → Agents Window. It’s the central architectural change in Cursor 3, designed for a developer whose primary job is orchestrating agents rather than writing code directly.

What is /multitask in Cursor 3.2?

A command introduced in Cursor 3.2 (April 24, 2026) that automatically runs async subagents to parallelize your request instead of adding it to a sequential queue. For larger tasks, it breaks them into smaller chunks and assigns each to a separate subagent running simultaneously. Combined with multi-root workspaces, it enables a single developer to delegate cross-repo tasks that fan out across frontend, backend, and shared libraries in parallel.

What is Composer 2?

Anysphere’s in-house coding model, purpose-built for agentic tasks. It scores 61.7% on Terminal-Bench 2.0, beating Claude Opus 4.6 (58.0) on this specific benchmark. Its key technical innovation is self-summarization: it’s trained to compress its own action history within the RL loop, reducing compaction errors by 50% compared to external summarization. Available in Cursor alongside Claude, GPT-5.4, Gemini, and open-source models.

How much does Cursor 3 cost?

Hobby (free): 2,000 completions/month, 50 slow premium requests. Pro ($20/month): unlimited completions, 500 fast premium requests, cloud agents included. Business ($40/user/month): adds team admin, SSO, privacy mode. Enterprise: custom pricing. Note that running parallel agents on frontier models (Claude Opus 4.7, GPT-5.4) consumes fast premium requests significantly faster than single-agent workflows — heavy fleet users may need the Business plan or additional model usage budget.

How does Cursor 3 compare to Claude Code?

Cursor 3 is a GUI-first platform with model flexibility, visual Design Mode, fleet management across environments, and a plugin marketplace — best for developers who want to orchestrate agent fleets visually across multiple repos. Claude Code is a terminal CLI that runs exclusively on Claude Opus 4.7 — best for terminal-native developers who want the deepest reasoning model and tightest MCP integration without any GUI overhead. Both support parallel agent workflows; the choice comes down to interface preference and model loyalty.

What is Cursor’s current valuation?

Anysphere was reportedly in discussions to raise at approximately $50 billion in March 2026 — nearly double its November 2025 valuation. The company has $2 billion in annual recurring revenue and has turned down acquisition offers, including one from OpenAI. Cursor 3 is available as a free update to all existing subscribers.

Latest Articles

Browse our comprehensive AI tool reviews and productivity guides

AI Is Replacing Developers — The Real Numbers (2026)

Snap fired 1,000. Google generates 75% of new code with AI. Entry-level developer jobs fell 20%. But 1.3M new AI roles were created and India's AI hiring surged 59.5%. Here's what's actually happening.

Leave a Comment