GPT-5.5's Lead in $1,500 LLM Hack Test Raises Questions for Competitors

In a recent AI test, GPT-5.5 outperformed 12 models, solving a security challenge in 70% of attempts, costing $9.46 each, revealing significant efficiency gaps.

Editorial Staff

1 month ago 1 min read

A recent experiment conducted by security researcher Kasra Rahjerdi highlights significant disparities in the capabilities of various AI models in tackling real-world security challenges. The test involved a vulnerable book review application, which contained exposed Firebase credentials allowing direct database access.

Over a dozen AI models were evaluated, each with a budget of $10 and a two-hour time limit, amounting to a total expenditure of $1,500. The standout performer was GPT-5.5, successfully solving the challenge in 7 out of 10 trials, with an average cost of $9.46 per solution. In contrast, DeepSeek V4 Pro was noted for its cost efficiency, solving 3 out of 10 runs at just $0.62 per success.

Other models, such as Claude Sonnet 4.6 and Claude Opus 4.8, each managed to solve the challenge in 2 out of 10 attempts. The least successful was Gemini 3.1 Pro Preview, which largely refused to engage, performing poorly with a median token count of only 9k compared to over 100k for other models. Rahjerdi noted that Chinese models exhibited greater willingness to interact with live databases than their Western counterparts.

Related Articles

Public Concern Grows as AI Firms Shape Tomorrow's Landscape Without Oversight

Superhuman enhances user trust by acquiring AI authenticity tool GPTZero

Local AI Model Surpasses Claude and Gemini: 4 Key Advantages Driving Adoption

Impact of Amazon's MGM Studios film cancellation ripples through AI industry following OpenAI deal

Geothermal AI initiative aims to optimize underground resources, benefiting clean energy sector

Survey Reveals Majority of Singles Find AI Companion Apps Off-Putting

OpenAI enhances ChatGPT with Getty Images, reshaping AI-generated content landscape

World Cup insights reveal optimal human-machine collaboration in AI development

Innovative bedside AI assistant offers cloud-free news reading for privacy-focused users

Share article