Recent evaluations by the UK AI Security Institute (AISI) reveal that GPT-5.5 has achieved notable results in cyberattack simulations, positioning it among the top AI models currently tested. This model successfully completed a complex, multi-stage attack simulation, demonstrating advanced capabilities in cybersecurity tasks.
In assessments involving 95 capture-the-flag tasks across varying difficulty levels, GPT-5.5 recorded an impressive average success rate of 71.4 percent at the highest "Expert" difficulty. In comparison, Claude Mythos Preview achieved a success rate of 68.6 percent, highlighting a competitive edge for GPT-5.5. Both models were tested against previous iterations, with GPT-5.4 and Claude Opus 4.7 scoring significantly lower at 52.4 percent and 48.6 percent, respectively.
The AISI's testing included a challenging simulation titled “The Last Ones,” which comprises 32 steps across multiple network segments. GPT-5.5 managed to complete this simulation successfully in 2 out of 10 attempts, while Claude Mythos Preview succeeded 3 out of 10 times. These findings indicate a growing potential for AI models in executing complex cybersecurity operations, although the absence of active defenders during testing raises concerns regarding their effectiveness in real-world scenarios.