Agents & Inferencefixture.example

National lab publishes safety benchmark with industry participation

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

An open safety benchmark — covering misuse, robustness, and refusals — launched with multi-vendor input, aiming to make safety claims comparable.

Summary B

Researchers released an open safety benchmark built with input from several model providers, covering misuse, robustness, and refusal behavior. Contributors hope shared tests make safety claims easier to compare across systems.

Two AI summaries of each story, blind-voted — see today's agents & inference digest →