Just days after its release, Anthropic's latest AI model, Claude Fable, is drawing criticism from cybersecurity researchers for what they describe as excessively stringent guardrails. According to a TechCrunch report, the model has been observed rejecting what experts call "innocuous tasks," including reading blog posts and performing code reviews.
The complaints center on a mismatch between Anthropic's intent to deploy a limited preview of its powerful cybersecurity model, Mythos, and the practical needs of the research community. Researchers argue that such safety filters hinder legitimate work, potentially slowing vulnerability discovery and threat analysis.
TechCrunch quotes unnamed researchers who have tested the system, though the report does not provide specific failure rates or examples of rejected prompts beyond the general categories mentioned. Anthropic positioned Fable as a public but capped version of Mythos, which is touted for specialized cybersecurity applications.
The tension highlights a broader industry challenge: balancing safety against utility in advanced AI systems. If these guardrails remain in place, researchers may be forced to seek alternative tools, potentially limiting Fable's adoption in the very community it was designed to serve.
Some experts, however, caution that overly broad guardrail critiques often overlook the need to prevent malicious use, suggesting that Anthropic may be erring on the side of caution intentionally.