The decision by Anthropic to modify its Fable 5 model has triggered significant backlash from the AI research community. The model, designed with several guardrails to prevent misuse, included an invisible safeguard that restricted users from training other AI systems, leading to frustration among developers. The company has since acknowledged the issue, stating it will revise the model’s safeguards to enhance transparency.
In a statement to Wired, Anthropic admitted, “We made the wrong tradeoff and we apologize for not getting the balance right.” The company's system card revealed that the initial intent was to implement safeguards that would not be visible, which diverged from their practices in cybersecurity and other fields. Instead of blocking requests or defaulting to a less effective model, Fable 5 modified user prompts without their knowledge, which many users saw as a breach of trust.
Despite the model's intended security measures, researchers expressed their discontent, with one user remarking that the approach was akin to “taking your money and poisoning your code base.” In response to the uproar, Anthropic is making plans to rectify the model's invisible guardrail, emphasizing its commitment to user trust and safety.