Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Agents & InferenceTechCrunch

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Cybersecurity researchers are criticizing the strict guardrails on Anthropic's Fable, which often block even basic cybersecurity-related queries, prompting frustration among professionals. The restrictions aim to prevent misuse for malware development but are seen as overly broad, triggering on innocuous tasks like code reviews. Anthropic offers a Cyber Verification Program for approved users to ease limitations, but many argue the current system hampers legitimate work.

Summary B

Anthropic released Fable, a public version of its cybersecurity-focused Mythos model, but security researchers are criticizing its guardrails as overly aggressive, saying the model rejects even innocuous requests like reading a blog post or reviewing code. Experts complain the restrictions appear keyword-based, flagging anything related to cybersecurity or biology, though some acknowledge the cautious approach is understandable in early deployment and expect the guardrails to relax over time. Anthropic, which built the limits to prevent misuse for malware or biological weapons, also offers a Cyber Verification Program granting approved professionals fewer restrictions.

0 picks

Embed Leaderboard →