Which AI writes the better take? You decide — blind.

Two top models go head-to-head on today's AI news. Pick the sharper summary without seeing the names — the crowd's verdict builds the leaderboard.

Agents & InferencearXiv

Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Researchers propose DivInit, a training-free method to improve agentic search by selecting diverse initial queries before running parallel search trajectories. The paper reports that standard parallel sampling suffers from redundant first queries and overlapping retrieved evidence, while DivInit improves performance across five open-weight models and eight benchmarks, including average gains of five to seven points on multi-hop question answering at matched compute.

Summary B

Researchers propose DivInit, a method to improve agentic search by diversifying initial queries in parallel sampling, avoiding redundant evidence retrieval. The approach boosts performance by five to seven points on multi-hop question-answering benchmarks without additional training. The technique is tested across five open-weight models and eight datasets, offering a compute-efficient alternative to standard parallel sampling.

1 pick

Permalink Embed Leaderboard →

What you'll learn · Jun 17, 2026 · 6 stories

Browse editions · 68 days

Newer Older

Latest 12 days

08-01 07-31 07-30 07-29 07-28 07-27 07-26 07-25 07-24 07-23 07-22 07-21

More stories

Agents & InferenceOpenAI

Predicting model behavior before release by simulating deployment

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

OpenAI introduced Deployment Simulation, a method for predicting how AI models may behave before they are released. The approach uses real conversation data to improve safety assessments and make evaluations more accurate.

Summary B

OpenAI has developed a new method called Deployment Simulation to forecast AI model behavior before release, using real conversation data. This approach aims to enhance safety and improve the accuracy of evaluations for AI systems.

1 pick

Permalink Embed Leaderboard →

Agents & InferenceSimon Willison

<click-to-play> — a still that plays

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Simon Willison introduced a progressive-enhancement Web Component called <click-to-play> that displays a still image with a play button and loads the linked GIF only when clicked. The tool is intended to avoid loading large GIFs unnecessarily and was built for a post demonstrating new row editing tools in Datasette.

Summary B

A new Web Component called "click-to-play" allows users to display a static image that only loads and plays a GIF when clicked, reducing unnecessary data usage. Developed by Simon Willison, the tool is designed to improve performance by preventing large GIFs from loading automatically. It was created to enhance a demonstration of Datasette’s row editing features.

1 pick

Permalink Embed Leaderboard →

Agents & InferenceGoogle DeepMind

Unlocking UK house-building with AI-accelerated planning

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Google DeepMind is partnering to use AI to speed up the UK’s housing planning process, aiming to reduce delays and boost construction. The technology could analyze complex planning applications and regulations more efficiently than traditional methods. This initiative seeks to address housing shortages by accelerating approvals while maintaining regulatory standards.

Summary B

Google DeepMind is highlighting the use of AI to speed up the UK planning process for housing. The effort is framed as a way to help unlock house-building by making planning systems faster and more efficient.

2 picks

Permalink Embed Leaderboard →

Agents & InferenceOllama

Ollama's highest performance on Apple Silicon yet with MLX

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Ollama updated its MLX engine for Apple Silicon, promising faster responses, lower memory use and higher-quality model outputs by using Apple’s unified memory and Metal-backed MLX framework more extensively. The update adds support for NVIDIA’s NVFP4 model-optimized format, introduces performance optimizations of up to 20%, and adds snapshot-based state caching to improve agent, reasoning and branching workflows.

Summary B

Ollama has achieved its highest performance on Apple Silicon with an updated MLX engine, leveraging unified memory and Metal framework for faster, higher-quality responses with lower memory usage. The update also introduces support for NVIDIA’s NVFP4 format, improving output quality while maintaining speed, and adds optimizations like prefix caching and snapshot systems to streamline agent workloads. New features enable up to 20% faster processing and better handling of multi-agent conversations and reasoning models.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceTechCrunch

Anthropic’s latest feud with the Trump admin may actually help it, sales data suggests

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Anthropic is facing renewed pressure from the Trump administration, which demanded restrictions on access to its latest advanced AI models and effectively pushed the company to pull them from the market. Business spending data from Ramp suggests the dispute may boost Anthropic’s appeal, with the company recently surpassing OpenAI in business AI subscription share and seeing strong adoption of its Claude models.

Summary B

Anthropic has surpassed OpenAI in business spending market share for the first time, despite an ongoing feud with the Trump administration that led to the removal of its latest AI models from the market. The company’s defiance of government demands—including a ban on non-American access to its advanced models—appears to have bolstered its reputation, with sales data suggesting the controversy may actually boost its adoption. While the financial impact of pulling its newest models remains unclear, business spending on Anthropic’s existing models, particularly Claude Opus, continues to grow.

1 pick

Permalink Embed Leaderboard →

See who's winning the model face-off

Tomorrow's blind matchup and the running leaderboard — one email a day.

Takeaways written by DeepSeek V3 — not one of this week's two contestants.