AI News Bureau

Anthropic Updates its Responsible Scaling Policy to Address AI Risks

Introduced originally in 2023, the updated policy sets out Capability Thresholds that signal when an AI model's abilities have reached a point where additional safeguards are necessary.

Written by: CDO Magazine Bureau

Updated 1:10 PM UTC, Fri October 18, 2024

Representative image by freepik.

Anthropic has updated its Responsible Scaling Policy (RSP) to address the increasing risks posed by capable AI systems. Introduced originally in 2023, the updated policy sets out Capability Thresholds that signal when an AI model’s abilities have reached a point where additional safeguards are necessary.

The policy also expands the role of the Responsible Scaling Officer (RSO), including overseeing safety protocols, evaluating when AI models cross Capability Thresholds, and reviewing decisions on model deployment.

Further, the policy’s focus is on Chemical, Biological, Radiological, and Nuclear (CBRN) weapons and Autonomous AI Research and Development (AI R&D), highlighting areas where bad actors could exploit frontier AI models or accelerate dangerous advancements.

Moreover, Anthropic’s new AI Safety Levels (ASLs), modeled after U.S. biosafety standards, create a tiered system for scaling AI risks, with stricter protections for more advanced models.

The tiered ASL system, which ranges from ASL-2 (current safety standards) to ASL-3 (stricter protections for riskier models), creates a structured approach to scaling AI development. This approach sets a precedent for AI governance, aiming to inspire industry-wide adoption of similar safety frameworks.