Editions

(10 days)

Agents & Inference, UTC dates, up to 6 stories/day.

What is an edition?

Each edition is one calendar day (UTC). Stories come from the allowlisted RSS feed. Use the menu to jump to any day; empty days show "empty" and may need a rebuild.

This day

6 stories

NewerOlder

Agents & Inference

Agents & InferenceHugging Face

Adding MCP Tools to Reachy Mini

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Reachy Mini's conversation app now supports adding tools hosted on Hugging Face Spaces via MCP, allowing users to expand the robot's capabilities without modifying the app directly. These remote tools, such as weather checks or web searches, run in the cloud rather than locally on the user's machine. Users can also publish their own tools for others to utilize through the platform.

Summary B

The Reachy Mini conversation app can now use external tools hosted in public Hugging Face Spaces and called via the Model Context Protocol, letting users add new robot abilities like weather checks or web search by linking a Space rather than editing the app's code. Because the tools run remotely in the Space itself, no code is downloaded locally, and developers can publish their own tools for others to use. Tools are controlled through profiles, where a tools.txt file determines which built-in, custom local, or remote capabilities the model is allowed to call.

Agents & InferenceSimon Willison

datasette-agent-micropython 0.1a0

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Simon Willison announced the alpha release of Datasette Agent Micropython 0.1a0, aimed at safely generating and executing Python code. Early testing shows promise, with GPT-5.5 unable to break the sandbox security measures. The update was shared on June 2, 2026, alongside a sponsorship offer for exclusive LLM development insights.

Summary B

An early alpha release of datasette-agent-micropython aims to let the Datasette Agent safely generate and execute Python code within a sandboxed environment. Initial testing has been encouraging, with GPT-5.5 so far unable to break out of the sandbox.

Agents & InferenceTechCrunch

Meta’s AI agent for WhatsApp Business is now available globally

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Meta has rolled out its customer support AI bot, now called Meta Business Agent, globally within WhatsApp and Instagram DMs after roughly two years of testing in markets like India and Mexico. The agent can answer customer questions, recommend products, book appointments, qualify sales leads, and route queries to humans, with planned features including daily chat briefings and integrations with tools like Shopify and Zendesk. Meta intends to monetize the tool through WhatsApp Business Premium subscription tiers and token-based pricing for large businesses.

Summary B

Meta's AI agent for WhatsApp Business is now available globally, offering customer support features like answering questions, recommending products, and booking appointments. The AI tool, tested in markets like India and Mexico, will be monetized through WhatsApp Business Premium subscriptions, with pricing based on usage for larger enterprises. Meta is also developing additional capabilities, such as market research and calendar management, while exploring integration with platforms like Shopify and Zendesk.

Agents & InferenceHugging Face

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

JetBrains has released Mellum2, an open 12-billion-parameter Mixture-of-Experts model optimized for low-latency text and code tasks. Building on the original Mellum code-completion model, it activates only a subset of parameters per token to deliver more than twice the inference speed of similarly sized open models while remaining competitive on coding, reasoning, science, and math benchmarks. JetBrains positions it as a "focal" model for high-frequency operations such as routing, RAG pipelines, sub-agent tasks, and private self-hosted deployments.

Summary B

JetBrains has released Mellum2, a 12-billion parameter Mixture-of-Experts model optimized for efficient text-and-code processing. The open model excels in low-latency tasks like code generation, reasoning, and retrieval pipelines while offering faster inference than similarly sized models. Mellum2 is designed for specialized use in software engineering workflows, agent systems, and private deployments where speed and efficiency are critical.

Agents & InferenceSimon Willison

Uber Caps Usage of AI Tools Like Claude Code to Manage Costs

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Uber has imposed a $1,500 monthly spending limit per employee for AI coding tools like Claude Code to control costs after exceeding its AI budget. The cap applies separately to each tool, allowing engineers to spend up to $3,000 monthly if using two tools. This policy reflects Uber's effort to balance AI tool benefits with cost management as usage surged unexpectedly.

Summary B

Uber has imposed a $1,500 monthly cap per employee on each AI coding tool, such as Cursor and Anthropic's Claude Code, to rein in spending after exhausting much of its 2026 AI budget within four months. The per-tool limits, set in recent months, mean spending on one tool doesn't affect another's budget, and amount to roughly 11% of a typical Uber engineer's median compensation if two tools are used.

Agents & InferenceTechCrunch

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.

Summary A

Microsoft has released ASSERT, an open source framework that lets developers test whether their AI systems behave as intended by turning plain-language descriptions of goals and policies into scored, structured tests. The tool generates problem scenarios, runs them against the target system, and records the AI's actions and tool calls so developers can pinpoint failures. It can be used during development, after deployment, or for continuous monitoring, addressing the need for application-specific evaluations that broader benchmarks miss.

Summary B

Microsoft has launched ASSERT, an open-source tool that enables developers to evaluate AI behavior using natural-language descriptions. ASSERT generates structured tests, problem scenarios, and scores based on specified goals, policies, and constraints, allowing for application-specific AI assessments. The framework supports continuous monitoring and aims to address gaps in broader AI evaluations.