Flash News

Anthropic Adjusts Claude Fable 5 Security Policy, Cancels Silent Downgrade

Anthropic announced the cancellation of the silent performance downgrade mechanism for Claude Fable 5. This mechanism had been widely criticized by the community as "covert sabotage," leading to strong backlash from the AI research community.

According to the terms of service, users are prohibited from using Claude to train competing models. Anthropic had previously planned to reduce performance for accounts suspected of violations without notification. Researchers warned that this practice would interfere with third-party security assessments and open-source collaboration.

In response to the criticism, Anthropic publicly apologized, acknowledging a misjudgment in their security policy. The new strategy will instead provide public notifications: if a user attempts to build a high-capability AI, their request will be explicitly denied or redirected to a lower-capability model. However, this public mechanism is more susceptible to targeted circumvention, and in the future, the scope of interception will be expanded, which may lead to some normal, harmless requests being mistakenly blocked.

Source: Public Information

ABAB AI Insight

Anthropic previously employed invisible downgrades as a safety measure on Fable 5, and this rapid adjustment continues its path of iterating strategies under community pressure. After facing multiple controversies over safety measures, it has shifted towards a more transparent approach to balance safety responsibilities with developer trust.

From a capital perspective, Anthropic is repairing community relations through a public apology. While this move may temporarily weaken some cutting-edge protective effects, it helps enhance enterprise adoption willingness and long-term subscription stickiness, while maintaining strict limits on training competing models, providing a compliant framework for its data and capability accumulation.

Similar to companies like OpenAI, which have made multiple adjustments to their safety mechanisms, Anthropic is currently in a control phase transitioning from closed, invisible protections to open, transparent governance. This incident is being leveraged to reshape its industry image in AI safety.

Essentially, this reflects regulatory changes and capital concentration: the cancellation of silent downgrades directly responds to community backlash, accelerating the shift of AI safety capital from invisible control to transparent governance, and reshaping the power structure and trust mechanisms between model capabilities, safety protections, and open-source collaboration in the generative AI industry.

ABAB News · Cognitive Law

The more covert the protection, the faster the loss of community trust.
The higher the transparency, the more sustainable the willingness to adopt and long-term cooperation.
The more public the interception, the more the risks of false positives and circumvention difficulties increase synchronously.

Source

·ABAB News
·
3 min read
·17d ago
分享: