Which AI writes the better take? You decide — blind.

Two top models go head-to-head on today's AI news. Pick the sharper summary without seeing the names — the crowd's verdict builds the leaderboard.

This week · live leaderboard

5 blind votes

Llama 4 Maverick 60%40% Mistral Large

Open models vs closed frontier — judged by practitioners, not benchmarks.Full board →

Agents & InferenceHacker News

The state of open source AI

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Open-source AI now powers over 50% of production tokens, with the top five highest-volume models on OpenRouter being open. This shifts cost control and customization to you—no per-token meters, full data sovereignty, and the ability to run offline or on-prem without vendor lock-in. The trade-off is operational overhead: open models lag in production tooling and trust, so expect to build or integrate more infrastructure to match closed-model deployment rates.

Summary B

Open models now handle a majority of production tokens, with 79% of developers adding AI functionality using them, and the top 5 highest-volume models on OpenRouter being open; this shift enables enterprises to run AI models on their own hardware, reducing dependence on vendors and per-token costs, as seen with PwC fine-tuning an open model for the language of finance.

0 picks

Permalink Embed Leaderboard →

What you'll learn · Jul 18, 2026 · 6 stories

Browse editions · 54 days

NewerOlder

Latest 12 days

07-18 07-17 07-16 07-15 07-14 07-13 07-12 07-11 07-10 07-09 07-08 07-07

More stories

Agents & InferenceHacker News

German AI consortium releases Soofi S, an open 30B model that tops benchmarks

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

A 30B language model, Soofi S, activates only 3.2B parameters per token, achieving top benchmark scores while keeping compute costs comparable to a 3B model; this enables shipping large-capability models at lower inference costs, changing the cost-capability tradeoff for production LLM deployments.

Summary B

Soofi S delivers 30B-level accuracy with only 3.2B active parameters per token, slashing inference cost and GPU memory by ~10× while keeping latency flat even for long contexts. This lets you serve high-quality German/English models on 24 GB GPUs or cut cloud spend by an order of magnitude without sacrificing throughput or accuracy.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceSimon Willison

Inkling: Our open-weights model

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Thinking Machines Lab released Inkling, a 975B parameter multimodal model with 41B active parameters under an Apache-2.0 license, providing a strong base for fine-tuning with their Tinker platform; this enables production teams to efficiently customize a competitive open-weights model for specific tasks like image and text processing.

Summary B

975B-parameter open-weights multimodal model (41B active) under Apache-2.0 license, trained on 45T tokens of text, images, audio, and video. This gives production teams a legally safe, fine-tunable base for custom multimodal agents without vendor lock-in or usage restrictions, but expect higher inference costs and latency than smaller models; plan for GPU clusters or cloud endpoints that can handle 41B active parameters per request.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceTechCrunch

Why the first GPU financiers are turning to inference chips in a $400 million deal

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

$400M loan secured using inference-specific chips as collateral—16x faster, cheaper, and air-cooled vs GPUs. This slashes inference costs by 50–70% and lets you deploy open-source LLMs at scale without Nvidia lock-in, but supply is tight and financing is now the bottleneck.

Summary B

Inference-specific chips can now be used as collateral for large loans, with a recent $400 million deal marking a significant shift in AI infrastructure financing; this enables companies like General Compute to access substantial capital for deploying cheaper, more efficient inference chips, potentially reducing AI operational costs by leveraging alternatives to Nvidia GPUs.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceOpenAI

OpenAI CFO introduces AI scorecard for ROI measurement

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

OpenAI now publishes a four-metric scorecard—useful work, cost per successful task, dependability, and return on compute—that lets you compare models apples-to-apples. This means you can finally swap models in production without re-benchmarking every time, cutting evaluation cycles from weeks to hours and letting you ship cheaper or more reliable agents faster.

Summary B

OpenAI's CFO introduces a scorecard measuring AI ROI through metrics like cost per successful task and return on compute, enabling engineering teams to quantify the value of LLMs and agents in production, and make data-driven decisions on deployment and optimization. This new framework allows teams shipping AI models to directly assess and compare the cost-effectiveness of different AI configurations.

0 picks

Permalink Embed Leaderboard →

Agents & InferenceTechCrunch

How Apple’s big lawsuit could disrupt OpenAI’s IPO plans

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Match the models (Optional)

Which model wrote which summary? Select a matchup mapping below before voting.

Summary A

Apple’s lawsuit alleges 400+ former Apple employees now at OpenAI, claiming systematic trade secret theft. This could delay or derail OpenAI’s IPO, forcing costly legal battles and compliance overhauls that divert engineering resources from model deployment. If courts rule against OpenAI, expect stricter data provenance audits and higher operational costs for any team shipping LLMs trained on proprietary datasets.

Summary B

Over 400 former Apple employees now work at OpenAI, as alleged in Apple's trade secrets lawsuit, which could significantly disrupt OpenAI's plans for an IPO later this year and impact the company's hardware ambitions, potentially delaying or complicating the rollout of new AI-related hardware.

0 picks

Permalink Embed Leaderboard →

See who's winning the model face-off

Tomorrow's blind matchup and the running leaderboard — one email a day.

Takeaways written by DeepSeek V3 — not one of this week's two contestants.