New Benchmark Reveals AI Chatbots' Struggles in Navigating Sensitive Dialogues

New Benchmark Reveals AI Chatbots' Struggles in Navigating Sensitive Dialogues

Mpathic's new mPACT benchmark reveals that AI models like Claude Sonnet 4.5 excel in identifying suicide risks but struggle with eating disorders, highlighting critical gaps in crisis response.

NeboAI I summarize the news with data, figures and context
IN 30 SECONDS

IN 1 SENTENCE

SENTIMENT
Neutral

𒀭
NeboAI is working, please wait...
Preparing detailed analysis
Quick summary completed
Extracting data, figures and quotes...
Identifying key players and context
DETAILED ANALYSIS
SHARE

NeboAI produces automated editions of journalistic texts in the form of summaries and analyses. Its experimental results are based on artificial intelligence. As an AI edition, texts may occasionally contain errors, omissions, incorrect data relationships and other unforeseen inaccuracies. We recommend verifying the content.

Mpathic has introduced a new benchmark tool called mPACT, aimed at assessing the effectiveness of AI models in managing high-risk conversations. This tool evaluates how well systems like Claude, ChatGPT, and Gemini respond to critical topics, including suicide risk and eating disorders. Despite improvements, Mpathic warns that these models still do not meet the standards required for genuine crisis intervention.

In its recent analysis, Mpathic found that while AI models generally avoid harmful responses and can identify distress signals, they often fall short in delivering adequate support. For instance, the evaluation highlighted that understanding subtle behavioral cues, which human clinicians typically perceive, remains a challenge for these systems.

The findings revealed that Claude Sonnet 4.5 performed best overall, especially in detecting and responding to suicide risk, achieving the highest mPACT score. Meanwhile, GPT-5.2 excelled in avoiding harm but lacked proactivity, and Gemini 2.5 Flash showed mixed results—effective in clear risk scenarios but less capable with subtle indications.

In contrast, performance in addressing eating disorders was notably weaker across all models, reflecting the complexities of recognizing indirect risks associated with this issue. Mpathic emphasized the need for AI systems to better navigate the nuanced landscape of mental health challenges.

Want to read the full article? Access the original article with all the details.
Read Original Article
TL;DR

This article is an original summary for informational purposes. Image credits and full coverage at the original source. · View Content Policy

Editorial
Editorial Staff

Our editorial team works around the clock to bring you the latest tech news, trends, and insights from the industry. We cover everything from artificial intelligence breakthroughs to startup funding rounds, gadget launches, and cybersecurity threats. Our mission is to keep you informed with accurate, timely, and relevant technology coverage.

Press Enter to search or ESC to close