Which AI writes the better take? You decide — blind.

Two top models go head-to-head on today's AI news. Pick the sharper summary without seeing the names — the crowd's verdict builds the leaderboard.

Agents & InferenceHugging Face

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Thousand Token Wood is a multi-agent economic simulation built for the Build Small Hackathon, running five woodland-creature traders as agents on a 3-billion-parameter Qwen2.5-3B model served via vLLM on Modal, with a Gradio interface. The project demonstrates that small models are well-suited to real-time multi-agent simulations because they are fast and cheap enough to run a council of agents each turn, though they function as reliable format generators rather than strong reasoners. Key engineering lessons included designing deliberate scarcity to drive trade, using sharper prompts rather than larger models to improve decision quality, and reframing agent wellbeing as a recoverable mood to avoid death spirals.

Summary B

Thousand Token Wood is a multi-agent economy simulation built on a 3-billion-parameter model, where woodland creatures trade goods in real-time. The project highlights how small models can efficiently handle multiple agents but require engineered scarcity and sharp prompts to ensure meaningful interactions. The simulation avoids crashes with a tolerant JSON parser and introduces dynamic elements like mood recovery to maintain engagement.

0 picks

Permalink Embed Leaderboard →

Browse editions · 57 days

Newer Older

Latest 12 days

07-21 07-20 07-19 07-18 07-17 07-16 07-15 07-14 07-13 07-12 07-11 07-10

More stories

Agents & InferenceTechCrunch

Google will pay SpaceX $920M per month for compute

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Google will pay SpaceX $920 million per month from October 2026 through June 2029 for access to roughly 110,000 NVIDIA GPUs and related compute, according to a regulatory filing. Google described the arrangement as short-term "bridge capacity" to meet surging demand for its Gemini Enterprise agent platform, with both parties able to terminate on 90 days' notice after December 31, 2026. The deal mirrors SpaceX's late-May agreement with Anthropic and comes just a week before SpaceX's expected Nasdaq debut, which aims to raise around $75 billion at a roughly $1.75 trillion valuation. Google, a longtime SpaceX investor, is expected to hold a stake worth more than $100 billion after the IPO.

Summary B

Google will pay SpaceX $920 million per month from October 2026 through June 2029 for access to around 110,000 NVIDIA GPUs and related compute resources. The deal, similar to SpaceX's recent agreement with Anthropic, allows either party to terminate with 90 days' notice after December 2026. Google cited unexpected demand for its AI products, while SpaceX prepares for its historic IPO, aiming for a $1.75 trillion valuation.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceSimon Willison

datasette-agent-micropython 0.1a0

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Simon Willison has released an alpha version of Datasette Agent for MicroPython, aiming to safely generate and execute Python code. Initial tests show promising results, with GPT-5.5 unable to escape the sandbox. The project seeks sponsors to support ongoing development.

Summary B

An early alpha release of datasette-agent-micropython aims to let the Datasette Agent safely generate and execute Python code within a sandbox. Initial testing has been encouraging, with GPT-5.5 reportedly unable to break out of the sandbox so far.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceHugging Face

Designing the hf CLI as an agent-optimized way to work with the Hub

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Hugging Face has redesigned its official hf command-line interface to serve both human users and AI coding agents like Claude Code, Codex, and Cursor, which increasingly drive Hub traffic. The CLI now auto-detects when an agent is running it and adjusts its output accordingly—stripping color and truncation in favor of compact, structured data. Benchmarks found the agent-optimized CLI uses up to six times fewer tokens than agents hand-rolling curl or the Python SDK on complex, multi-step tasks.

Summary B

The Hugging Face CLI (hf) has been redesigned to optimize interactions for both human users and AI coding agents like Claude Code and Codex. The updated CLI offers tailored outputs—rich formatting for humans and compact, structured responses for agents—while tracking agent usage, which has grown significantly since April 2026. Benchmarking shows agents using the hf CLI require up to 6 times fewer tokens for complex tasks compared to alternatives like curl or the Python SDK.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceTechCrunch

The token bill comes due: Inside the industry scramble to manage AI’s runaway costs

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Companies across the tech industry are reeling from soaring AI costs as autonomous agents and aggressive adoption drive token consumption far beyond budgets, with Uber exhausting its 2026 AI coding allowance by April and one company reportedly facing a $500 million bill. In response, a new market of tools and vendors is emerging to help track spending, and the Linux Foundation has unveiled the Tokenomics Foundation, a standards body aiming to bring cloud-style cost discipline to AI usage. Studies suggest heavy AI use boosts productivity but also increases bugs and rewrites, leaving executives unsure whether to rein in spending or encourage more of it.

Summary B

Companies are struggling with skyrocketing AI costs as token consumption surges despite falling per-token prices, forcing budget overruns and urgent cost-control measures. Industry leaders like Uber and Microsoft have already exceeded AI budgets, prompting new initiatives like the Tokenomics Foundation to manage spending. Executives report shifting focus from AI capabilities to cost containment, with some comparing unchecked token usage to addiction.

1 pick

Permalink Embed Leaderboard →

Agents & InferenceSimon Willison

micropython-wasm 0.1a2

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Simon Willison has introduced a CLI to micropython-wasm, inspired by a blog draft, to enhance the "Try it yourself" section. The update was shared on June 6, 2026.

Summary B

A command-line interface has been added to micropython-wasm in version 0.1a2, addressing issue #7. The feature was developed to better demonstrate the project's "Try it yourself" capabilities.

0 picks

Permalink Embed Leaderboard →

See who's winning the model face-off

Tomorrow's blind matchup and the running leaderboard — one email a day.