Anthropic on Tuesday launched Claude Fable 5, its most capable public model, but the rollout has been marred by developer complaints that its safety system blocks legitimate prompts. Within two days of release, users flooded social media with reports of false positives, where innocent queries were flagged and downgraded.

Fable 5 is the first public model derived from Anthropic's Mythos family, a training lineage that exhibited unusual skill at finding software bugs during development. That capability prompted the company to treat cybersecurity with the same caution as biology and chemistry when setting safety boundaries. Prompts flagged in those high-risk domains are now routed to Claude Opus 4.8, a less capable model with its own guardrails. Anthropic says the fallback affects about 0.05% of queries and notifies users when it occurs.

Safety filters reflect a deliberate design tradeoff. Anthropic prioritized caution in the classifiers that detect potentially dangerous uses, but this has led to a high rate of false positives. Developers are frustrated: many report that routine coding or security-research prompts trigger the downgrade, undermining the model's utility for technical work.

The backlash highlights a broader challenge for frontier AI companies: how to balance safety with transparency and usefulness. If users cannot trust that legitimate prompts will work, they may turn to less restricted models from competitors like OpenAI or Mistral. The incident also raises questions about whether safety mechanisms can be refined without sacrificing the very capabilities that make advanced models valuable.

Anthropic has not yet announced plans to adjust the filters. The company acknowledges the accuracy challenge but has not detailed how it will address developer feedback. The storm over Fable 5 underscores a growing tension: as AI models become more powerful, the line between necessary guardrails and usability friction grows thinner.