Claude Mythos and Project Glasswing — How Companies Should Prepare for the Era When AI Uncovers Long-Dormant Bugs

Updated:April 27, 2026Published:April 27, 2026

Lead

What Claude Mythos Preview and Project Glasswing have demonstrated is the arrival of an era in which AI can discover "long-dormant software vulnerabilities" and autonomously build exploits for them. Attack surfaces that humans had failed to uncover have been exposed one after another—a bug that had lain dormant for 27 years in OpenBSD's network stack, a 16-year-old bug that had slipped through more than 5 million automated fuzzing runs of FFmpeg, and a FreeBSD NFS remote code execution vulnerability (CVE-2026-4747), among others.

This article draws on official announcements from Anthropic (red.anthropic.com, anthropic.com/glasswing) and assessments by experts including Bruce Schneier to clarify what Mythos and Glasswing are changing. It also presents, in checklist format, five implementation actions our company proposes for DevSecOps teams at mid-sized enterprises. Mythos itself remains in limited release, but there is already much that organizations can do with current models. We have entered an era in which waiting until public release before taking action will almost certainly leave you behind.

Claude Mythos Preview is a limited-release model that Anthropic positions as a frontier model specialized for cybersecurity. It is not a general-purpose code generation model; rather, it autonomously handles the entire offense-defense cycle of vulnerability discovery in source code, exploit construction, and remediation patch generation.

On Anthropic's red team research blog, red.anthropic.com, Mythos Preview is described as operating "at a level that surpasses most human experts." There are no plans for general public release, but it is reportedly available to select limited partners at $25 per million input tokens and $125 per million output tokens.

This model is being deployed as the core tool of Project Glasswing to proactively inspect critical software around the world. For defenders like our company thinking about business continuity, it is a foundational premise that must be understood first.

Differences from Existing General-Purpose Models

Until now, general-purpose models excelled at code generation and detecting known vulnerability patterns, but could not match human researchers when it came to autonomously constructing complex exploit chains or automatically discovering low-level memory safety bugs.

What makes Mythos Preview distinctive is its ability to execute—fully autonomously after a single initial prompt—(1) bug discovery from source code, (2) construction of fully functional exploits, (3) reverse engineering of closed-source software, and (4) generation of remediation patches. Anthropic officially uses the phrase "fully autonomously," reporting that the model discovered and exploited a 17-year-old remote code execution vulnerability without human intervention.

This represents a qualitative shift in the world of vulnerability discovery. Traditional SAST (static analysis) and DAST (dynamic analysis) relied on pattern matching and input fuzzing, which meant they could only discover attack surfaces that their designers had anticipated. Mythos semantically understands entire codebases and constructs attack paths that even the original developers had not noticed. This is also the basis on which our company explains to clients that "AI code review is not a replacement for existing tools, but a separate, additional detection layer."

The Leap Benchmarks Reveal

According to Anthropic's announcement, on the CyberGym benchmark, the general-purpose Claude model used for comparison scored 66.6%, while Mythos Preview achieved 83.1%. CyberGym is an attacker-perspective benchmark that measures both vulnerability discovery and exploit construction, and is fundamentally different in nature from SWE-bench, which measures coding ability.

In evaluations using OSS-Fuzz, while general-purpose models discovered 150–175 crashes at Tiers 1–2, Mythos Preview recorded 595 crashes at the same tiers. Furthermore, it succeeded at Tier 5—the highest difficulty level, involving full control-flow hijacking—across 10 targets, representing a leap from the level of "causing a crash" to the level of "achieving arbitrary code execution."

In exploit testing against Firefox vulnerabilities, while the general-purpose model succeeded only twice in hundreds of attempts, Mythos Preview succeeded 181 times and achieved register control an additional 29 times. The reproducibility is orders of magnitude different. This indicates not an incidental success dependent on randomness, but a stable capability to systematically construct attack procedures.

If this capability were to fall into the hands of attackers, the preparation time available to defenders would be decisively shortened. The strategic significance of the situation in which "Anthropic secured this first" is considerable.

The "Historic Bugs" Mythos Uncovered

The discovery cases disclosed by Mythos Preview are important for illustrating the scope of its capabilities. All have been patched through a responsible Coordinated Vulnerability Disclosure process and are positioned as success stories for defenders.

The four cases introduced here all share the characteristic of having been "overlooked by existing fuzzing and human review." The reason long-dormant bugs were discovered in concentration is that the model capability to simultaneously perform semantic understanding of source code and construction of attack paths had simply been absent from prior automation tools.

These cases are ones our company frequently cites as concrete examples when explaining the cost-effectiveness of introducing AI code review in the context of supporting clients' DevSecOps practices. If you are asked internally, "What is the justification for investing in AI review?", presenting these cases will immediately add significant persuasive force to your argument.

OpenBSD Network Stack's 27-Year Bug

OpenBSD is an OS widely used in critical infrastructure as a firewall and VPN gateway. Mythos Preview discovered a remote crash vulnerability that had been lurking in this OS's TCP SACK (Selective Acknowledgment) implementation for 27 years.

The attack mechanism exploits integer overflow in TCP sequence numbers. SACK is a performance-enhancing feature that allows the receiver to report packet gaps in aggregate, but its boundary calculations failed to fully account for situations where sequence numbers wrap around the 32-bit range. By deliberately sending sequences that straddle this boundary, an attacker can corrupt internal state and trigger a kernel panic.

The fact that this vulnerability was overlooked by numerous researchers and automated tools over many years demonstrates that models capable of semantically understanding source code context can complement existing syntactic inspection tools. Boundary condition bugs are a classic class of issue that "cannot be found without imagining the test case," and this is a domain where AI that reasons semantically about input space excels.

Because OpenBSD is used as the foundation of defensive products, had the discovery been made by attackers first, it could have led to a serious situation in which the very perimeter of corporate networks was compromised.

FFmpeg & FreeBSD — Bugs That Slipped Past 5 Million Fuzzing Runs

In FFmpeg, the video codec implementation, Mythos Preview detected a 16-year-old bug in the H.264 decoder stemming from a legacy implementation constraint involving 16-bit types. When processing a specially crafted video file containing out-of-bounds input, an internal buffer overflows, leading to arbitrary code execution. This is the class of bug that could not be discovered even after more than 5 million automated fuzzing tests conducted by OSS-Fuzz — a prime example of AI logically discovering a code path that random input cannot reach.

In FreeBSD, Mythos Preview autonomously discovered and exploited a 17-year-old bug (CVE-2026-4747) in NFS (Network File System) that permits remote code execution. The exploit combines a type confusion and memory corruption vulnerability lurking in NFS's RPC decoding process, and is fully automated end-to-end, including the construction of a ROP (Return-Oriented Programming) chain. Anthropic has explicitly stated that the system operates "fully autonomously after the initial prompt," demonstrating that AI can independently perform the ROP gadget searches commonly used by attackers.

Linux Privilege Escalation and Firefox JIT Exploit

In the Linux kernel, the model itself chained multiple vulnerabilities together to achieve privilege escalation from a standard user to root. Even individually minor bugs can form a path to privilege escalation when combined in sequence — manipulating memory layout, reusing kernel objects, and dereferencing across privilege boundaries. The model automatically assembled an attack chain that would take a human penetration tester several days to construct.

On the browser side, the model generated a compound exploit targeting Firefox's JIT (Just-In-Time) compiler. By combining JIT spraying with heap layout prediction, it reached a path to arbitrary code execution from a web page, achieving 181 successful runs and an additional 29 instances of register control. The reproducibility is orders of magnitude higher than before, indicating that this is no longer a "lucky one-off" but an established, reliable capability.

Project Glasswing — Building a Defensive Advantage Consortium

Project Glasswing is an industry consortium formed to ensure that defensive actors use Mythos Preview ahead of attackers. It is a deliberate effort to comprehensively audit critical software with AI and usher in an era of defensive advantage.

According to Anthropic's official page at anthropic.com/glasswing, the following 11 organizations are listed as founding partners: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. In addition, more than 40 organizations are participating in the capacity of critical infrastructure maintainers.

The window during which defensive actors can hold exclusive access to AI capabilities is limited. Anthropic itself has stated that "acting now can create an AI era of defensive advantage" — but the flip side of that statement is a warning that inaction will result in an era of offensive advantage.

Anthropic's Commitments

Anthropic has committed up to $100 million in usage credits for Mythos Preview and $4 million in direct donations to open-source security organizations. The breakdown of donations is $2.5 million to Alpha-Omega and OpenSSF via the Linux Foundation, and $1.5 million to the Apache Software Foundation.

Glasswing plans to publish a progress report within 90 days. The full responsible disclosure process follows Anthropic's Coordinated Vulnerability Disclosure policy, with findings published in principle within 90 days or upon patch release, whichever comes first. Furthermore, technical details are typically withheld for an additional 45 days after patch release before being made public, striking a balance between giving developers time to respond and enabling information sharing among defenders.

Two Tracks for Participation

There are two main pathways for mid-sized companies and OSS maintainers to engage with Glasswing.

The first is the "Claude for Open Source" program, a framework that provides Claude to OSS maintainers free of charge or at preferential pricing. OSS projects with a certain level of activity on GitHub can also accept applications from individual maintainers. If a maintainer of OSS that your company's product depends on applies, the resulting improvement in upstream security will in turn protect your own supply chain.

The second is the "Cyber Verification Program," through which researchers and companies with legitimate cybersecurity credentials can apply for limited access to more powerful models. The primary targets are companies with in-house SOC or penetration testing teams, as well as security vendors.

Access to the Mythos Preview itself is limited to a select group of organizations, but tracking its developments and reflecting them in your own security posture is meaningful for all companies. Before dismissing it as irrelevant because you don't qualify, it is advisable to first internally evaluate whether applying is an option.

Expert Assessments and Key Issues

The announcements of Mythos and Glasswing have prompted several important points of discussion from the professional cybersecurity community. Rather than uncritical optimism, it is necessary to calmly assess what has changed and what has not.

Here we focus on two particularly influential arguments: Bruce Schneier's warning that "the defensive advantage will shrink," and the observation from security firm Aisle that "some capabilities can already be replicated with existing models." Both suggest priorities for action that companies should take immediately.

Schneier: "The Defensive Advantage Is Shrinking"

Security researcher Bruce Schneier, writing on his blog schneier.com, acknowledges that Mythos's defensive advantage is real, but warns that "as increasingly powerful models emerge, that advantage is likely to shrink."

He concedes the asymmetry—that "finding bugs for remediation is easier for AI than finding and exploiting them"—as a temporary benefit, while suggesting it is only a matter of time before the offensive side reaches equivalent capability using open-source models.

Schneier advocates for a shift toward system-level resilience, premised on "a world where zero-day exploits become dime-a-dozen." Specifically, he argues that the central pillars should be segmented design, logging, least privilege, and rapid response capabilities that assume breach—rather than single-point patch management. This means transitioning from a model that aims to "prevent intrusion" to one that aims to "minimize damage even when intrusion occurs"—in other words, a thorough commitment to zero-trust design.

Reproducibility with Existing Models

Security firm Aisle has pointed out that some of the capabilities Mythos achieves can be replicated using inexpensive, general-purpose models that are already publicly available. While a gap still exists between "finding a vulnerability" and "converting it into a practical attack," the goal for enterprise defense is to "find exploitable bugs early and eliminate them"—so there is no need to automate all the way through exploit construction. Sufficient return on investment can be achieved simply by detecting bugs early and automatically suggesting remediation patches.

In other words, even mid-sized companies that cannot access Mythos Preview have considerable room to incorporate "AI-driven vulnerability discovery" into their operations using the current model lineup alone. This is the point we emphasize most strongly to our clients. Rather than concluding "we can't do anything because we don't have access to a frontier model," the right starting point is to "accurately understand what can be done with current models."

Five Actions Enterprise DevSecOps Can Take This Week

To avoid treating the Mythos Preview news as someone else's problem and instead integrate it into your own security posture, abstract discussion must be translated into concrete action. We introduce five implementation actions we recommend to mid-sized companies.

None of these presuppose access to Mythos itself. They are limited to what can be implemented using a combination of current models and existing toolchains, and can take shape in as little as one to two weeks, or within roughly three months for full-scale deployment. If pursued in parallel with AI governance development, compliance documentation and implementation can proceed simultaneously, making it easier to avoid scrambling for audit readiness later.

① Integrate Existing General-Purpose Models into Internal Fuzzing / Code Review

Mythos itself remains under limited access, but static code analysis and automated review using current general-purpose Claude or GPT models are already implementable. By integrating these into a CI/CD pipeline and establishing a three-layer approach per PR — "LLM review → existing linter → fuzzing" — it is possible to significantly reduce logic bugs that human reviewers tend to miss.

Rather than rushing toward full automation, the practical approach is to start with a small repository, measure effectiveness, and accumulate internal data on detection rates and false positive rates. Measuring on a quarterly basis — tracking "the proportion of LLM-flagged issues that turned out to be real bugs" and "the number of bugs caught by the LLM that passed human review" — will provide a solid basis for investment decisions.

② Apply for the Cyber Verification Program / Claude for Open Source

Companies whose core business is security, or organizations with dedicated internal research staff, should consider applying to the Cyber Verification Program. If accepted, they gain partial access to model capabilities at the Mythos level, enabling thorough pre-release inspection of their own products.

Since applications require demonstration of legitimate professional track records, it is important to consistently document contributions to vulnerability disclosure and bug bounty participation. Relevant credentials include individual employees' participation records on platforms such as HackerOne or Bugcrowd, CVE-assigned vulnerability reports, and the number of acknowledgments received through responsible disclosure.

For organizations that publish OSS, Claude for Open Source represents a more realistic entry point. OSS projects with an established activity history on GitHub can also apply on a per-maintainer basis.

③ Rewrite IR Playbooks Assuming Persistent Zero-Days

Traditional incident response playbooks are often built on the assumption of "responding after a known CVE is reported." However, what Mythos has demonstrated is the reality that thousands of undisclosed zero-days lie dormant across the world's major operating systems, browsers, and libraries. The moment attackers reach an equivalent level of AI-driven discovery capability, these will be weaponized.

Assuming a state of "persistent zero-days," the following four elements should be incorporated into IR playbooks:

Multi-layered detection: Use EDR, network anomaly detection (NDR), and behavioral analysis (UEBA) in combination, so that blind spots in one layer are covered by another
Strict isolation and least-privilege enforcement: Through network segmentation and IAM least-privilege practices, build a structure that contains damage even in the event of a breach
Extended log retention: Ensure a retention period long enough to trace evidence months back from the time an attack is discovered (minimum 90 days, recommended 365 days)
Tabletop exercises for major incident scenarios: Conduct quarterly drills with the CISO, legal, and communications teams, simulating initial response to an unknown zero-day

④ Integrate Shift-Left with AI Code Review

Integrate shift-left — a core principle of DevSecOps — with AI code review. By building a mechanism that reflects OWASP Top 10 and OWASP LLM Top 10 checklist items as automated PR comments, developers can transition from a loop of "fix after being flagged" to one of "receive feedback the moment they write code."

A secondary benefit is that senior reviewers can concentrate their time on design reviews that genuinely require difficult judgment calls. In engagements we have supported, there are cases where human review time was reduced by 30–40% within two months of introducing AI code review.

One important caveat: rules must be established to prevent AI review from becoming a rubber stamp. Over-relying on "the LLM gave it the OK, so it's safe" risks the inverse of false positives — namely, false negatives (missed issues) becoming a problem. It is essential to explicitly document the operational rule that AI serves as a supplementary layer and does not replace final human judgment.

⑤ Align with EU AI Act / OWASP LLM Top 10

The EU AI Act, now fully in force, requires cybersecurity measures and ongoing risk management for high-risk AI systems. When formally operating an AI-driven vulnerability discovery process, the accountability of the AI itself and the preservation of its logs become compliance considerations. Being able to retroactively trace "which model flagged what code, when, and with what findings" is a prerequisite for audit readiness.

Cross-referencing each category of the OWASP LLM Top 10 — including prompt injection, sensitive information disclosure, and supply chain risks — against internal governance, and explicitly documenting the lines of responsibility in the event that AI code review results escalate into an incident, will prove valuable in future audits.

Data protection laws are also being established across ASEAN countries (Thailand's PDPA, Vietnam's PDPL, Indonesia's PDP Law, and Laos's Personal Data Protection Law), and alignment of governance frameworks — extending to local subsidiaries — is increasingly required.

Our Perspective — Practical Solutions for Mid-Sized Enterprises

The Mythos Preview itself will not reach mid-sized companies. That said, the era in which "AI finds and exploits vulnerabilities" is not someone else's problem. What we want to convey is that sufficient defensive reinforcement is possible even with current models, and that now—precisely as a preparation period—is the time to act.

From the perspective of supporting mid-sized companies in Thailand, Japan, and across ASEAN, the priorities are clear. First, incorporate internal review and fuzzing using current general-purpose Claude and GPT into day-to-day operations. Second, rewrite IR playbooks on the assumption that "zero-days are routine," and establish log retention and tabletop exercises. Third, align AI governance frameworks (EU AI Act / OWASP LLM Top 10) with your own operations.

Waiting until "Mythos is publicly released" is too late. What Glasswing has demonstrated is the reality that the window of time granted to defenders is not particularly long. As Schneier warns, it is only a matter of time before attackers reach equivalent capability, and at that point, companies that have done nothing will be the first to suffer.

We offer support for building secure coding practices leveraging Claude, establishing LLM governance, and introducing in-house AI code review. Companies looking to reassess their cybersecurity posture in the age of AI are encouraged to reach out.

Author & Supervisor

Yusuke Ishihara

Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).

Updated:May 15, 2026

What is an AI Gateway? An Implementation Guide for Securely Integrating Multiple LLM Providers

Updated:May 14, 2026

How to Automate B2B Procurement with AI Agents — A Step-by-Step Guide to Autonomous Supplier Selection and PO Issuance for Thai Manufacturing

How to Streamline Thailand's Import/Export and Customs Operations with AI — Automating HS Code Classification, ASEAN Certificate of Origin, and Thailand AEO Compliance