Vidoc Security replicated Anthropic’s Mythos vulnerability findings by re-running the same cases Anthropic highlighted and reproducing multiple bugs using publicly available AI models. The tests used GPT-5.4 and Claude Opus 4.6 within an open-source coding agent and targeted a server file‑sharing protocol, a networking stack in a security-focused operating system, embedded video‑processing software, and two cryptographic libraries. Each automated scan ran at a cost under $30 per file.
This introduction summarizes the headline replication and key numeric details. Subsequent sections present the replication results.
Anthropic recently launched Claude Mythos, emphasizing its potential dangers for public deployment due to the vulnerabilities it exposes. Vidoc Security took on the challenge of replicating these findings using accessible AI models like GPT-5.4 and Claude Opus 4.6, implemented through an open-source coding agent called opencode.
The specific vulnerabilities Vidoc targeted included those highlighted by Anthropic: a server file-sharing protocol, the networking stack of a security-focused operating system, video-processing software in media platforms, and two cryptographic libraries essential for online digital identity verification. This replication underscores the capabilities of public AI models in identifying significant security flaws in widely used software components.
Vidoc Security replicated Anthropic’s Mythos findings using GPT-5.4 and Claude Opus 4.6 inside opencode, an open-source coding agent. The team re-ran the same cases Anthropic highlighted and executed scans without a Glasswing invite, without private API access, and without Anthropic’s internal stack. The implementation relied on publicly available AI models to explore supplied codebases for potential vulnerabilities. Each run operated within an open-tooling environment rather than Anthropic’s internal infrastructure.
The workflow mirrored Anthropic’s public description: provide a codebase, let it explore, parallelize attempts, and filter for signals. Vidoc built the same architecture using open tooling and implemented a planning agent alongside a detection agent. This configuration allowed parallelized exploration and automated signal filtering across target codebases.
Both GPT-5.4 and Claude Opus 4.6 reproduced two bug cases in all three runs. Claude Opus 4.6 rediscovered a bug in OpenBSD three times, while GPT-5.4 scored zero on that particular bug. Some returned findings were partial, surfacing the correct code area without identifying the exact root cause. Every scan cost under $30 per file.
These points summarize Vidoc Security’s replication methodology and the comparative outcomes reported. The account records different rediscovery rates between models and instances of partial recovery on certain bugs.
“We replicated Mythos findings in opencode using public models, not Anthropic’s private stack,” Dawid Moczadło wrote on X. “A better way to read Anthropic’s Mythos release is not ‘one lab has a magical model.’ It is: the economics of vulnerability discovery are changing.” “AI models are already good enough to narrow the search space, surface real leads, and sometimes recover the full root cause in battle-tested code,” Moczadło said on X.
Vidoc Security’s tests replicated Anthropic’s Mythos findings using publicly available AI tools, demonstrating that independent teams can reproduce the published vulnerability results. The replication has been described as emphasizing a change in the economics of vulnerability discovery. The outcome highlights evolving dynamics in how security problems are found and addressed.


