Two AIs summarize each story. You pick the better one.

A daily AI agents & inference digest — every story summarized by two models, blind. Vote the better summary, then see the running leaderboard of which model practitioners actually prefer. One short email a day, free.

Editions

(8 days)

Agents & Inference, UTC dates, up to 6 stories/day.

What is an edition?

Each edition is one calendar day (UTC). Stories come from the allowlisted RSS feed. Use the menu to jump to any day; empty days show "empty" and may need a rebuild.

This day

6 stories

NewerOlder

Agents & Inference

Agents & InferenceHugging Face

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

JetBrains has released Mellum2, an open 12-billion-parameter Mixture-of-Experts model optimized for low-latency text and code tasks. The company says it performs competitively with similarly sized open models while delivering more than twice the inference speed, making it suited for high-throughput production workloads. Designed as a "focal" model for software engineering, Mellum2 targets uses such as routing and orchestration, RAG pipelines, agent subtasks, and private self-hosted deployment.

Summary B

JetBrains has introduced Mellum2, a 12B Mixture-of-Experts model optimized for efficient text-and-code tasks, offering faster inference and lower latency for software engineering workloads. The open model excels in routing, retrieval-augmented generation (RAG) pipelines, and sub-agent tasks while being deployable in private environments. Designed for specialized use rather than replacing larger models, Mellum2 aims to enhance AI system efficiency and cost-effectiveness.

Agents & InferenceTechCrunch

Nvidia chases $200B CPU market with AI agent PCs from Microsoft, Dell, and HP

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Nvidia is targeting the $200B CPU market by partnering with Microsoft, Dell, HP, and others to launch AI-powered PCs featuring its new RTX Spark superchip, designed to run AI agents and local large language models. The chip promises faster performance, enhanced security, and support for over 1,000 applications, positioning it as a tool for creators, gamers, and AI workflows. CEO Jensen Huang sees this as a transformative shift toward agent-driven computing, building on Nvidia's recent success in AI hardware.

Summary B

Nvidia opened Computex with the RTX Spark, a new 1-petaflop PC CPU it calls a superchip, designed to run AI agents and local large language models securely. Windows PCs powered by the chip will arrive this fall from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI, with Acer and Gigabyte to follow, backed by more than 100 software makers including Adobe and Riot Games. The move is part of CEO Jensen Huang's pursuit of a $200 billion CPU market, envisioning PCs that complete tasks on command rather than requiring traditional pointing, clicking, and typing.

Agents & InferenceSimon Willison

llm-anthropic 0.25.1

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Simon Willison released llm-anthropic 0.25.1, noting its use for generating pelicans and referencing Opus 4.8. The update was shared on May 28, 2026, alongside a sponsorship offer for a monthly LLM developments digest.

Summary B

The llm-anthropic plugin has been updated to version 0.25.1, with the new release used to generate pelican drawings tied to notes on Opus 4.8.

Agents & InferenceHugging Face

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

IBM Research argues that scalable enterprise AI adoption depends not on large language models alone but on "agent logic"—software primitives such as knowledge graphs, algorithms, and program analysis libraries that steer LLMs through complex enterprise workflows. The piece contends this approach reduces context demands, hallucinations, and token costs while improving agent quality and user trust, citing widespread failures of AI pilots as motivation. IBM tested the concept by building agents for offerings like watsonx Code Assistant for Z, which uses deep static analysis to aid mainframe application modernization.

Summary B

Scalable enterprise AI adoption requires more than just large language models (LLMs), relying instead on agent logic to ensure quality, cost-effectiveness, and user trust. Agent logic, which includes tools like knowledge graphs and algorithms, helps steer LLMs to better align with dynamic enterprise workflows while reducing errors and inefficiencies. IBM's watsonx Code Assistant for Z demonstrates this approach by using agent logic to enhance mainframe application development.

Agents & InferenceTechCrunch

This AI weather startup is out-forecasting government agencies

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

WindBorne Systems, a startup founded by Stanford students in 2019, has released WeatherMesh-6, an AI weather model the company says outperforms the European Centre for Medium-Range Weather Forecasts on accuracy for key variables like surface temperature. The model generates hourly forecasts at 3-km resolution in Europe and the continental U.S., and is reportedly as accurate five days out as traditional forecasts are one day ahead. WindBorne attributes its edge to a proprietary data advantage from roughly 400 weather balloons in flight whose sensor readings are fed directly into its models.

Summary B

A startup called WindBorne Systems has developed an AI weather forecasting tool, WeatherMesh-6, which outperforms predictions by leading government agencies like the European Centre for Medium-Range Weather Forecasts. The system offers more frequent updates, higher resolution, and greater accuracy, leveraging data from hundreds of weather balloons launched globally. This advancement highlights the growing potential of AI in improving weather predictions over traditional methods.

Agents & InferenceSimon Willison

Pasted File Editor

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Simon Willison highlights a feature in Claude.ai where pasted text is automatically converted into a file attachment. He built a prototype using Codex desktop to replicate this functionality, adding direct file opening and drag-and-drop support. The post also mentions his sponsorship offer for a monthly LLM developments digest.

Summary B

A new prototype called Pasted File Editor replicates a feature found in Claude's apps, where large blocks of pasted text are automatically converted into file attachments. Built using Codex desktop, the tool also lets users open files directly—displaying images as thumbnails—or drag files onto the text area.