Which AI writes the better take? You decide — blind.

Two top models go head-to-head on today's AI news. Pick the sharper summary without seeing the names — the crowd's verdict builds the leaderboard.

Agents & InferenceHugging Face

Five labs, five minds: building a multi-model finance drama on small models

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Five labs collaborated to create a multi-model finance simulation where different small AI models interact as agents in an emergent economy, each representing distinct financial behaviors. Players act as shadow financiers, manipulating the market through tips, alliances, and trades while avoiding detection by a magistrate. The system relies on heterogeneous models from various labs, ensuring diverse decision-making and market dynamics.

Summary B

Developers built version two of "Thousand Token Wood," an experimental finance game in which players act as a shadow financier—lending, shorting, bribing, and trading on insider tips while evading a pursuing magistrate—within an emergent woodland economy. The key engineering change runs each of the game's creature-agents on a different lab's small AI model, including OpenAI's gpt-oss-20b, OpenBMB's MiniCPM3-4B, NVIDIA's Nemotron-Mini-4B, and a fine-tuned Qwen 0.5B, so each behaves distinctly. The team found the main challenge lay at the serving layer rather than the modeling, solved through a tolerant JSON parse-and-repair system, while keeping insider-tip truth values hidden from the agents as a security requirement.

1 pick

Permalink Embed Leaderboard →

Browse editions · 58 days

Newer Older

Latest 12 days

07-22 07-21 07-20 07-19 07-18 07-17 07-16 07-15 07-14 07-13 07-12 07-11

More stories

Agents & InferenceTechCrunch

Google will pay SpaceX $920M per month for compute

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Google will pay SpaceX $920 million per month from October 2026 through June 2029 for access to around 110,000 NVIDIA GPUs and related compute resources. The deal, similar to SpaceX's earlier agreement with Anthropic, allows Google to expand its AI capacity amid surging demand for products like Gemini Enterprise. Both companies can terminate the agreement with 90 days' notice after December 2026, and Google's access will ramp up by September 2026.

Summary B

Google will pay SpaceX $920 million per month from October 2026 through June 2029 for access to roughly 110,000 NVIDIA GPUs and related computing components, according to a regulatory filing. Google framed the agreement as short-term "bridge capacity" to meet surging demand for its Gemini Enterprise platform, with both parties able to cancel after December 31, 2026, on 90 days' notice. The deal comes just a week before SpaceX's planned Nasdaq IPO, which aims to raise around $75 billion at a $1.75 trillion valuation, with Google a longtime investor whose stake could exceed $100 billion afterward.

1 pick

Permalink Embed Leaderboard →

Agents & InferenceSimon Willison

datasette-agent-micropython 0.1a0

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Simon Willison has released an alpha version of Datasette Agent, designed to safely generate and execute Python code. Early tests show promise, with GPT-5.5 failing to bypass the sandbox security measures. The project is part of Willison's ongoing work, supported by sponsors who receive monthly updates on LLM advancements.

Summary B

A new alpha release, datasette-agent-micropython 0.1a0, aims to let Datasette Agent safely generate and execute Python code within a sandboxed environment. Early testing has shown promise, with GPT-5.5 reportedly unable to break out of the sandbox so far.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceHugging Face

Designing the hf CLI as an agent-optimized way to work with the Hub

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

The Hugging Face CLI (hf CLI) has been redesigned to optimize interactions for both human users and AI coding agents like Claude Code and Codex. It now detects agent usage via environment variables, tailoring outputs to be compact and structured for agents while maintaining rich formatting for humans. Early data shows significant agent adoption, with Claude Code and Codex leading in user numbers and request volume on the Hub.

Summary B

Hugging Face has rebuilt its official hf command-line interface to serve both human users and AI coding agents like Claude Code, Codex, and Cursor, which increasingly use the tool to interact with the Hub. The CLI detects when an agent is driving it and adjusts its output accordingly—stripping color and formatting in favor of compact, structured data—and benchmarks showed that the no-CLI baseline can consume up to six times more tokens than using hf on complex tasks. Hugging Face began tracking agent traffic in April 2026, with Claude Code and Codex leading usage at roughly 40,000 users and nearly 49 million requests for Claude Code alone.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceTechCrunch

The token bill comes due: Inside the industry scramble to manage AI’s runaway costs

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Companies across the tech industry are confronting soaring AI costs as more autonomous agents and aggressive adoption drive token consumption far beyond budgets, with examples like Uber exhausting its 2026 AI coding budget by April and one firm reportedly facing a $500 million Claude bill. In response, a market is emerging to help track and control AI spending, including the Linux Foundation's newly announced Tokenomics Foundation, which aims to bring cost discipline to AI tokens similar to what FinOps did for cloud spending. Studies suggest heavy AI use boosts developer productivity but at steeply higher costs and with more bugs and rewrites, prompting companies to impose token limits and seek better visibility and ROI.

Summary B

Companies are struggling with skyrocketing AI token costs, with some blowing through budgets months early and facing unexpected price hikes. The industry is scrambling for solutions, including new standards and tools to track spending, as AI adoption and autonomous agents drive up consumption. Executives report shifting focus from performance to cost control, with some comparing unchecked AI usage to an addiction.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceSimon Willison

micropython-wasm 0.1a2

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

A new alpha release of micropython-wasm, version 0.1a2, has been published with the addition of a command-line interface. The CLI was inspired by efforts to demonstrate the project's functionality in a hands-on "Try it yourself" section.

Summary B

Micropython-wasm 0.1a2 now includes a CLI tool, added by Simon Willison to enhance usability. The update was inspired by a blog draft and aims to improve the "Try it yourself" experience. Willison also promotes a $10/month sponsorship offering curated LLM updates.

0 picks

Permalink Embed Leaderboard →

See who's winning the model face-off

Tomorrow's blind matchup and the running leaderboard — one email a day.