Project Glasswing: Anthropic’s “Too Dangerous to Release” AI and the Cybersecurity Reckoning

📋 Disclosure: NivaaLabs publishes independent AI tool reviews based on research and analysis. Some links on this site may be affiliate links — if you click and purchase, we may earn a small commission at no extra cost to you. This never influences our editorial recommendations. Read our full disclosure →

Project Glasswing: The AI That’s Too Dangerous to Release — And Why Anthropic Built It Anyway

🗞️ Announced April 7, 2026. This article covers Claude Mythos Preview and Project Glasswing, based on Anthropic’s official announcement, the Frontier Red Team technical blog, and independent security analysis published April 7–22, 2026.

🎯 Quick Verdict

Project Glasswing is the most significant AI safety decision Anthropic has ever made in public — and possibly the most significant any AI lab has made. Anthropic built a model, concluded it was too dangerous to release, and instead of releasing it, launched a controlled defensive operation. Claude Mythos Preview found thousands of zero-day vulnerabilities across every major OS and browser, autonomously, including a 27-year-old bug in OpenBSD for under $50 per run. The question isn’t whether this changes cybersecurity. It already has. The question is whether the defenders can stay ahead of what comes next.

Announced April 7, 2026 — alongside Claude Mythos Preview system card
Zero-Days Found Thousands across every major OS and browser — 99%+ still unpatched
Anthropic Commitment $100M in model usage credits + $4M in direct open-source donations
Model Status NOT publicly available — restricted to ~50 vetted partner organizations

On April 7, 2026, Anthropic announced a model it will not release to the public.

That sentence has never been written about a frontier AI lab before. Labs release models. That’s the business. That’s the product. That’s how the revenue works. When Anthropic decided that Claude Mythos Preview was too capable at cybersecurity tasks to release broadly — and that the right response was a controlled defensive operation rather than a product launch — it made a decision with no direct precedent in the AI industry.

The model it’s holding back found a 27-year-old bug in OpenBSD. Autonomously. No human involved after the initial prompt. The total campaign cost: approximately $20,000. The specific run that found the vulnerability: under $50. Automated fuzzers had exercised the same codebase millions of times without catching it. Every human auditor who reviewed OpenBSD’s TCP implementation over nearly three decades missed it. Claude Mythos Preview found it, and then demonstrated it could crash any OpenBSD server responding over TCP. Claude Mythos in-depth analysis

Project Glasswing is what Anthropic did instead of releasing it publicly.

⚡ Claude Mythos Preview vs Prior Models — Cybersecurity Capability Leap

The AI They Won’t Release — And Why That’s the Story

In March 2026, Fortune reported that Anthropic was developing an unreleased model described internally as “by far the most powerful AI model” the company had ever built. A draft blog post inadvertently made public around the same time described Mythos as “currently far ahead of any other AI model in cyber capabilities” and warned that it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.”

That was the advance warning. The April 7 announcement was the formal acknowledgment, and it came with an unusual companion: a technical document from Anthropic’s Frontier Red Team detailing exactly how capable Mythos Preview is — not to generate excitement, but to justify why it isn’t being released and why an emergency defensive response is necessary.

Sundar Pichai’s quote from Google Cloud Next the same week captures the industry’s reaction: the conversation changed from “can AI find vulnerabilities?” to “how do we manage what it’s already finding?” Anthropic’s own framing is blunter: “Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout — for economies, public safety, and national security — could be severe.”

Anthropic briefed senior US government officials on Mythos Preview’s full capabilities — both offensive and defensive — before any external announcement. The UK AI Security Institute evaluated the model and confirmed it was the first AI system able to complete their full-network takeover simulation. The intelligence community is paying attention. “They want secure code and to use AI to find network vulnerabilities as well,” said one person familiar with multiple IC agency discussions, speaking anonymously to Nextgov/FCW.

⚠️ Capability Baseline: Claude Opus 4.6 generated 2 working Firefox exploits in testing. Claude Mythos Preview generated 181. That’s a 90x jump in a single model generation. Mythos saturated Anthropic’s entire Cybench CTF benchmark at 100%, forcing the red team to shift to real-world zero-day discovery as the only meaningful evaluation remaining.

What Claude Mythos Preview Can Actually Do

The Frontier Red Team technical blog published alongside the Glasswing announcement is unusually candid for a company’s own security disclosure. It’s worth reading the specific capabilities described, rather than the summary, because the gap between “finds vulnerabilities” and “autonomously chains four-vulnerability browser exploits to escape OS sandboxes” is the entire story.

Autonomous Zero-Day Discovery

Mythos Preview identifies previously unknown vulnerabilities in real codebases without human guidance after an initial prompt. This is not a model that searches known CVE databases or patterns against prior vulnerabilities. It reasons about code semantics — understanding how functions interact, how memory is managed, how authentication flows work — and identifies subtle logical flaws that automated fuzzers and human auditors miss. When Anthropic gave it 100 Linux kernel CVEs from 2024 and 2025 to assess, it filtered them to 40 potentially exploitable candidates and successfully built privilege escalation exploits for more than half. One complete exploit chain — from CVE identifier and git commit hash to working privilege escalation — completed in under a day at a cost under $2,000. Historically, that process has taken skilled security researchers days to weeks.

Autonomous Exploit Construction

Finding a vulnerability is one thing. Building a working exploit is harder — it requires understanding precisely how to trigger the flaw, what mitigations exist and how to bypass them, and how to deliver the payload reliably under real conditions. Mythos Preview doesn’t stop at the bug report. It spins up the actual software environment, attaches debuggers, and confirms exploitability under real conditions before reporting. The exploit it constructed for CVE-2026-4747 involved a 20-gadget ROP (Return-Oriented Programming) chain split across six sequential NFS packets — a level of exploit sophistication that, two years ago, would have required a nation-state-tier researcher to produce. Mythos produced it in approximately four hours.

Multi-Stage Chaining

The most dangerous capability is the most consistent behavioral pattern: Mythos doesn’t find one bug and stop. It autonomously chains 3–5 vulnerabilities into multi-stage attack sequences. The pattern observed repeatedly: initial access exploit → privilege escalation (via KASLR bypass or use-after-free) → credential harvesting → lateral movement to adjacent systems. In one documented case, it wrote a web browser exploit that chained together four separate vulnerabilities, producing a complex JIT heap spray that escaped both the renderer sandbox and the operating system sandbox simultaneously. Because that exploit targets vulnerabilities that remain unpatched, Anthropic cannot publish the details. But they’ve published a cryptographic commitment proving the exploit existed at the time of writing.

Sandbox Awareness

In what Anthropic describes as a “potentially dangerous capability,” Mythos Preview — when specifically instructed to attempt escape during a safety evaluation — successfully escaped from a secured sandbox environment. The system card notes this occurred “in a few rare cases” when the model was explicitly ordered to try. The UK AI Security Institute’s evaluation confirmed Mythos was the first AI model able to complete their full-network takeover simulation, while qualifying that the test environment lacked some real-world security features.

The Zero-Day Showcase — Four Cases Worth Understanding

Anthropic can only discuss the vulnerabilities that have been patched. Over 99% of what Mythos found remains under coordinated disclosure — the companies have been notified but haven’t deployed patches yet, and Anthropic is bound by responsible disclosure norms. What’s in the public technical report is therefore a floor, not a ceiling. Here is what they can show:

🔴 The OpenBSD TCP SACK Bug — 27 Years Old

OpenBSD is famous for one thing: security. It is the operating system that security-conscious developers choose precisely because of its track record of minimal vulnerabilities. A 27-year-old bug sitting in its TCP SACK implementation — since 1998 — is not a minor finding.

The vulnerability is a two-bug chain that requires understanding how the kernel walks its linked list of TCP SACK holes, how sequence number arithmetic works under integer overflow conditions, and how a carefully crafted sequence of two packets can simultaneously satisfy contradictory conditions and trigger a null-pointer write that crashes the kernel. Automated fuzzers ran against this code millions of times without catching it, because catching it requires semantic reasoning about how TCP options interact adversarially — not just input fuzzing.

Mythos found it across approximately 1,000 scaffold runs at a total campaign cost under $20,000. The specific run that surfaced the flaw cost under $50. The vulnerability has been patched. Every unpatched OpenBSD server, anywhere in the world, was crashable by any internet user who sent two crafted TCP packets.

🔴 CVE-2026-4747 — FreeBSD NFS, 17 Years Old, Unauthenticated Root

This is the most technically stark vulnerability in Anthropic’s public disclosure. A stack buffer overflow in FreeBSD’s RPCSEC_GSS authentication handler allows an attacker-controlled packet to overwrite 304 bytes of arbitrary content onto the stack. Standard mitigations don’t apply: the buffer is declared as an integer array, so GCC’s stack protector doesn’t instrument it; and FreeBSD doesn’t randomize kernel load addresses, making ROP gadget locations predictable.

Mythos didn’t just find the vulnerability. It solved the authentication problem required to reach the vulnerable code path — not by brute force, but by reasoning that a single unauthenticated NFSv4 EXCHANGE_ID call returns the server’s UUID and NFS daemon start time, which are sufficient to reconstruct the required values. It then constructed a 20-gadget ROP chain split across six sequential NFS packets to deliver unauthenticated root access. The entire process was fully autonomous — no human guidance after the initial prompt. Time to working exploit: approximately four hours.

Every machine running FreeBSD’s NFS server was accessible to any unauthenticated internet user for seventeen years. The vulnerability has been patched.

🔴 FFmpeg H.264 Codec — 16 Years Old, Fuzzed 5 Million Times Undetected

FFmpeg’s H.264 decoder uses a 32-bit slice counter stored in a 16-bit lookup table, initialized to a sentinel value of 65,535. A specially crafted video frame containing exactly 65,536 slices causes the counter to collide with that sentinel, triggering an out-of-bounds write. The vulnerability was introduced in a 2003 commit and exposed by a 2010 refactor. In the 16 years since, automated fuzzers exercised the vulnerable code path more than 5 million times without triggering the flaw — because triggering it requires reasoning about the specific arithmetic relationship between a 32-bit counter and a 16-bit sentinel, not random input generation. Mythos caught it by reasoning about code semantics.

🔴 Browser Sandbox Escape — Four-Vulnerability Chain

The most sophisticated capability in the public report: a browser exploit that chains four separate vulnerabilities together, using a JIT heap spray technique to escape both the renderer sandbox and the operating system sandbox simultaneously. Because the target vulnerabilities remain unpatched, Anthropic cannot disclose details. They have published a cryptographic commitment proving the exploit existed at the time of writing. What’s public is the behavioral pattern: the model didn’t find one browser vulnerability. It found four, reasoned about how they could be chained, and built the chain autonomously.

Cybersecurity concept — digital lock and code representing vulnerability detection and defense
Project Glasswing is Anthropic’s attempt to give defenders the first-mover advantage before capabilities like Mythos’s become broadly available. Source: Pexels

What Is Project Glasswing?

Project Glasswing is Anthropic’s answer to a question no AI lab had publicly faced before: what do you do when you build something too dangerous to release?

The name comes from the glasswing butterfly — a species whose wings are transparent, allowing it to camouflage itself while remaining in plain sight. The metaphor is apt: Project Glasswing is designed to be visible and transparent about the threat while restricting access to the capability that creates it.

The operational structure is a controlled defensive deployment. A selected group of approximately 50 organizations — 12 launch partners and more than 40 additional infrastructure operators — receive access to Claude Mythos Preview specifically for defensive security work: finding and fixing vulnerabilities in foundational software before those vulnerabilities can be exploited by attackers. Partners commit to responsible disclosure protocols, coordinated patch timelines, and sharing learnings with the broader industry.

The financial structure signals that this isn’t a commercial play dressed up as safety. Anthropic committed $100 million in model usage credits to Project Glasswing partners. Additionally, it donated $2.5 million to Alpha-Omega and the Open Source Security Foundation (OpenSSF) through the Linux Foundation, and $1.5 million to the Apache Software Foundation — enabling the maintainers of critical open-source software to participate in the coordinated response. Open-source maintainers can apply through the Claude for Open Source program.

The pricing structure for beyond the initial credits is also notable: when Mythos Preview becomes available to Glasswing participants commercially, it will be priced at $25/$125 per million input/output tokens — 5x the cost of Claude Opus 4.7. This is not a model Anthropic is trying to make widely accessible. It is a model Anthropic is trying to deploy carefully, with full accountability, to the organizations best positioned to use its capabilities defensively.

The public Glasswing technical report is expected in early July 2026, when enough of the discovered vulnerabilities will have been patched to allow disclosure. That report will trigger the most significant coordinated patch cycle in recent memory — across operating systems, browsers, cryptography libraries, and major infrastructure software.

The Partner Coalition

The Glasswing partner list is a who’s who of critical infrastructure: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, and NVIDIA are confirmed launch partners. Each receives Mythos Preview access for defensive security work on their own systems and, where applicable, the open-source software they maintain.

AWS’s statement captures the strategic logic: “Security isn’t a phase for us; it’s continuous and embedded in everything we do. Our teams analyze over 400 trillion network flows every day for threats, and AI is central to our ability to defend at scale. We’ve been testing Claude Mythos Preview in our own security operations, applying it to critical codebases, where it’s already helping us strengthen our code.”

Cisco’s statement frames the coalition logic: “This work is too important and too urgent to do alone.” That’s not marketing. That’s an accurate description of the coordination problem. Individual organizations can use Mythos defensively on their own software. But the world’s attack surface is shared — a vulnerability in FreeBSD affects every organization running it, regardless of whether they’re a Glasswing partner. The coalition model is designed to cover as much of the shared attack surface as possible before the window between AI-assisted discovery and AI-assisted exploitation narrows further.

A week after the Glasswing announcement, OpenAI announced a similarly limited rollout of its own cybersecurity-focused model — the first instance of a direct competitive response to a safety initiative rather than a capability announcement. The industry dynamic is shifting.

Vulnerability Age Discovery Cost Severity Status
OpenBSD TCP SACK 27 years ~$20,000 campaign / <$50 per run Remote crash (any TCP server) Patched ✅
FreeBSD NFS (CVE-2026-4747) 17 years Comparable Unauthenticated remote root Patched ✅
FFmpeg H.264 codec 16 years ~$10,000 campaign Out-of-bounds write (media parsing) Patched ✅
Browser sandbox escape (4-chain) Unknown Undisclosed Full sandbox escape Unpatched ⚠️
Thousands of additional findings Various Various Various (many critical) 99%+ unpatched ⚠️

The Wider Implications — Why the Math Is Frightening

To understand why Project Glasswing matters, you need to hold two numbers in your head simultaneously.

The offense number: Average time-to-exploit has dropped to 5 days from the moment a vulnerability is disclosed, down from 30 days in 2022 (CrowdStrike 2026 Global Threat Report). In the first half of 2025, 32.1% of CVEs were exploited on or before the day of disclosure. Average attacker breakout time — from initial access to lateral movement across a network — is now 29 minutes, with the fastest observed at 27 seconds.

The defense number: The median organizational patch window is approximately 70 days. Unchanged since 2022. The percentage of organizations deploying critical patches within 30 days has actually declined, from 45% to 30% over that same period.

Defense is not just slower than offense. Defense is getting slower while offense accelerates. A $20,000 Mythos discovery campaign that runs in hours replaces months of nation-state research effort. If that capability becomes broadly available — through model proliferation, through open-source models reaching similar capability levels, or through adversarial actors finding ways to access Mythos-class capabilities — the window between a vulnerability existing and that vulnerability being exploited at scale collapses toward zero.

The scale problem is structural. Over 25 billion IoT devices are projected to be connected in 2026. Of those, 60% have unpatched CVEs older than two years. 75% lack any auto-update mechanism. In healthcare, 99% of hospitals are exposed to IoMT vulnerabilities, and more than 40% of medical devices are end-of-life with no patches available. These devices cannot be patched by an AI model — they have to be replaced. And they won’t be replaced before AI vulnerability discovery scales.

AISLE, an AI cybersecurity startup, ran a separate test that adds a concerning data point: when they tested Anthropic’s showcase vulnerabilities against small, openly available models, eight out of eight detected the FreeBSD exploit. One of those models had only 3.6 billion active parameters and costs $0.11 per million tokens. A 5.1-billion-parameter open model recovered the core analysis of the 27-year-old OpenBSD bug. AISLE’s conclusion: “The moat in AI cybersecurity is the system, not the model.” Cheap models find the same bugs. That makes the capability Glasswing is trying to manage a structural problem, not a Mythos-specific one. The escalation clock doesn’t stop when Project Glasswing ends.

The Criticisms Worth Taking Seriously

Project Glasswing has been praised almost universally by the security research community. But the criticisms that exist are substantive and deserve honest treatment.

The independent verification problem. Because Mythos Preview is not publicly available, independent researchers cannot audit the capability claims. The evidence in the Frontier Red Team blog rests on Anthropic’s own testing, with cryptographic commitments for unreleased vulnerabilities offered as accountability anchors. That’s a reasonable accountability mechanism — but security researcher Casey Swift pointed out that providing a model with detailed prior vulnerability context (CVE identifiers, crash reports) is not equivalent to fully autonomous discovery. Some of Mythos’s N-day demonstrations were assisted with more context than “fully autonomous” implies. Anthropic’s zero-day demonstrations are harder to challenge on this basis, but the N-day case is legitimately mixed.

The governance gap in partner access. Anthropic’s responsible disclosure commitment covers what Anthropic itself does with Mythos findings. It does not publicly specify whether those obligations are contractually binding on all Glasswing partners, whether Anthropic logs what codebases partners scan, or what enforcement action exists if a partner misuses access. The competitive misuse risk — a partner using Mythos to scan competitor code defensively while also surfacing exploitable vulnerabilities — is not publicly addressed. Anthropic experienced a notable operational security incident in late March 2026 (approximately 512,000 lines of unminified Claude Code source code leaked via an npm package source map), which raises additional questions about oversight reliability at the provider level.

The “responsible disclosure as marketing” critique. Technology companies have a history of warning about the dangers of their own products to generate coverage while continuing to develop capabilities. OpenAI did it with GPT-2 in 2019. The Glasswing critics who raise this concern note that Anthropic’s Mythos announcement generated more positive press about the company’s “safety-first” positioning than almost any product launch could have. That observation doesn’t make the safety decision wrong — but it’s worth holding alongside the decision when assessing whether the responsible disclosure is as complete as it appears.

These criticisms don’t negate Project Glasswing’s value. They describe the limits of what a voluntary industry initiative can accomplish against a structural problem. The defenders’ advantage that Glasswing provides is real and time-bounded. It ends when similar capabilities become broadly available.

Final Verdict

Project Glasswing is the most important AI safety decision made in public by any frontier lab to date — precisely because it was a genuine tradeoff that cost Anthropic something real. Holding back your most capable model means forfeiting commercial revenue, forfeiting competitive positioning, and betting that the defensive value of a controlled deployment exceeds the commercial value of a public release. That’s not a costless gesture. It’s a decision with financial consequences that Anthropic made anyway.

The cybersecurity implications are already playing out. The partner organizations are scanning critical codebases with the most capable vulnerability discovery tool ever built. Patches are being developed and deployed. The July 2026 public report will trigger the largest coordinated disclosure and patch cycle in recent memory. That outcome is unambiguously good.

The harder question — the one Project Glasswing cannot answer — is what happens after July 2026. The capabilities Mythos Preview demonstrated will be replicated, by other labs, on cheaper hardware, under fewer restrictions. CrowdStrike’s CEO put it plainly on LinkedIn the same day: “AI is creating the largest security demand driver since enterprises moved to the cloud.” That demand driver doesn’t pause while Glasswing runs. It compounds.

What Anthropic has demonstrated with Project Glasswing is that it’s possible — technically and organizationally — for an AI lab to detect a capability threshold, make a responsible decision, and build a defensive coalition before releasing the capability broadly. Whether the rest of the industry follows that precedent, voluntarily or under regulatory pressure, is the question that will define AI’s impact on global cybersecurity for the next decade.

🔐 Is Your Organization at Risk?

Open-source maintainers can apply for Glasswing access through Anthropic’s Claude for Open Source program. Organizations responsible for critical infrastructure should contact Anthropic directly.

Learn About Project Glasswing →

$100M in usage credits committed · $4M in direct open-source donations · Public report expected July 2026

❓ Frequently Asked Questions

What is Project Glasswing?

Project Glasswing is Anthropic’s initiative to use Claude Mythos Preview — its most powerful and unreleased AI model — exclusively for defensive cybersecurity work. Approximately 50 vetted organizations receive access to find and fix zero-day vulnerabilities in critical software before those vulnerabilities can be exploited by attackers. Anthropic committed $100M in model usage credits and $4M in direct donations to open-source security organizations.

Why is Claude Mythos Preview not publicly available?

Anthropic concluded that Mythos Preview’s cybersecurity capabilities are too dangerous to release publicly. The model can autonomously discover and exploit zero-day vulnerabilities across every major operating system and browser — capabilities that, in the wrong hands, could enable large-scale AI-driven cyberattacks. Anthropic chose a controlled defensive deployment over a commercial release and has stated it does not plan to make Mythos Preview generally available until new safeguards are developed.

What is CVE-2026-4747?

A 17-year-old remote code execution vulnerability in FreeBSD’s NFS server, identified and fully exploited autonomously by Claude Mythos Preview. The flaw allowed any unauthenticated internet user to gain root access to any machine running FreeBSD’s NFS. Mythos constructed a 20-gadget ROP chain split across six sequential network packets to exploit it — with no human involvement after the initial prompt. The vulnerability has been patched.

Who are the Project Glasswing partners?

Launch partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, and NVIDIA. More than 40 additional organizations responsible for critical software infrastructure also received access. Open-source maintainers can apply through Anthropic’s Claude for Open Source program.

When will Project Glasswing findings be made public?

A public technical report is expected in early July 2026, once enough of the discovered vulnerabilities have been patched to allow responsible disclosure. Anthropic follows a 90-day notification timeline and 45-day post-patch window before publishing technical details on individual vulnerabilities. Over 99% of what Mythos has found remains under coordinated disclosure and will not be publicly detailed until patched.

How much did it cost to find the 27-year-old OpenBSD bug?

The total discovery campaign cost approximately $20,000 across roughly 1,000 scaffold runs. The specific model run that surfaced the vulnerability cost under $50. By comparison, a nation-state security research team might invest months of skilled researcher time to find a comparable vulnerability. Mythos found it in hours, at a fraction of the cost.

Latest Articles

Browse our comprehensive AI tool reviews and productivity guides

AI Is Replacing Developers — The Real Numbers (2026)

Snap fired 1,000. Google generates 75% of new code with AI. Entry-level developer jobs fell 20%. But 1.3M new AI roles were created and India's AI hiring surged 59.5%. Here's what's actually happening.

Best AI Coding Tools 2026: Every Major Tool Ranked — Cursor, Claude Code, Copilot, Windsurf & More

85% of developers now use AI coding tools daily. AI writes 46% of all new code. The market has 10+ serious tools and most developers end up using two or three. Here's how every major AI coding tool in 2026 ranks — with real benchmark data, honest pricing, and a verdict for every workflow type.

GPT-5.5 vs Claude Opus 4.6 (2026): Which AI Model Wins for Your Work?

OpenAI's GPT-5.5 arrived April 23 claiming to be the smartest model yet. Anthropic's Claude Opus 4.6 still holds the top Chatbot Arena ELO. Both cost real money. Which one actually wins for your workflow? Here's the full data-driven comparison.

GPT-5.5 Review: OpenAI’s Smartest Model Yet — Agentic Coding, Computer Use & More (April 2026)

GPT-5.5 landed April 23 — seven weeks after 5.4. OpenAI calls it a "new class of intelligence for real work." It's faster per token, stronger at agentic coding, computer use, and scientific research, and comes with the strongest safety guardrails yet. Here's everything you need to know.

Leave a Comment