Agents & InferenceTechCrunch

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Microsoft has launched ASSERT, an open-source framework that lets developers test AI behavior by converting natural-language descriptions into scored evaluations. The tool generates test cases based on specified rules and checks if AI systems comply with application-specific policies. It aims to address gaps in broader AI evaluations by focusing on tailored, continuous monitoring for deployed models.

Summary B

Microsoft has released ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), an open source framework that lets developers create AI behavior tests from plain-language descriptions of their system's intended goals, policies, and constraints. The tool converts those descriptions into structured tests, generates and runs problem scenarios against the target system, scores the results, and records the AI's decision paths so developers can pinpoint failures. Microsoft says the framework can be used during development, after deployment, and for continuous monitoring, addressing the need for application-specific evaluations that broader benchmarks cannot cover.

Two AI summaries of each story, blind-voted — see today's agents & inference digest →