Microsoft's new AI system, codenamed MDASH, has achieved a score of 88.45% on the CyberGym benchmark, outperforming Anthropic's Mythos, which scored 83.1%. This benchmark, developed by researchers at UC Berkeley, evaluates AI systems on their ability to identify real-world software vulnerabilities across a set of tasks derived from open-source projects.
Launched this week, MDASH utilizes over 100 specialized AI agents that collaborate within a multi-model framework to detect software vulnerabilities. The system was unveiled alongside the identification of 16 new vulnerabilities in Windows, including four critical flaws that were addressed during this month’s Patch Tuesday.
Unlike Anthropic's single-model Mythos, which has faced scrutiny for its vulnerability detection capabilities, MDASH operates through a structured process where agents scan code, validate findings, and simulate attacks to confirm the presence of bugs. The scores generated in the CyberGym benchmark are self-reported by the respective companies and have not been independently verified.