Agents & InferenceTechCrunch

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Microsoft has launched ASSERT, an open-source tool that enables developers to evaluate AI behavior using natural-language descriptions. ASSERT generates structured tests, problem scenarios, and scores based on specified goals, policies, and constraints, allowing for application-specific AI assessments. The framework supports continuous monitoring and aims to address gaps in broader AI evaluations.

Summary B

Microsoft has released ASSERT, an open source framework that lets developers test whether their AI systems behave as intended by turning plain-language descriptions of goals and policies into scored, structured tests. The tool generates problem scenarios, runs them against the target system, and records the AI's actions and tool calls so developers can pinpoint failures. It can be used during development, after deployment, or for continuous monitoring, addressing the need for application-specific evaluations that broader benchmarks miss.

Two AI summaries of each story, blind-voted — see today's agents & inference digest →