Ollama's highest performance on Apple Silicon yet with MLX
Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.
Ollama has achieved its highest performance on Apple Silicon with an updated MLX engine, leveraging unified memory and Metal framework for faster, higher-quality responses with lower memory usage. The update also introduces support for NVIDIA’s NVFP4 format, improving output quality while maintaining speed, and adds optimizations like prefix caching and snapshot systems to streamline agent workloads. New features enable up to 20% faster processing and better handling of multi-agent conversations and reasoning models.
Ollama updated its MLX engine for Apple Silicon, promising faster responses, lower memory use and higher-quality model outputs by using Apple’s unified memory and Metal-backed MLX framework more extensively. The update adds support for NVIDIA’s NVFP4 model-optimized format, introduces performance optimizations of up to 20%, and adds snapshot-based state caching to improve agent, reasoning and branching workflows.