DiffusionGemma: 4x faster text generation
Which summary reads better? Pick one — models revealed after.Both summaries are AI-generated.
Google DeepMind introduced DiffusionGemma, an experimental open text-generation model that uses diffusion to generate blocks of text in parallel rather than token by token. Released under an Apache 2.0 license, the 26B Mixture of Experts model is designed for speed-critical local workflows and can deliver up to 4x faster inference on dedicated GPUs, though traditional autoregressive Gemma models remain preferred for high-quality production use.
Google DeepMind unveiled DiffusionGemma, an open experimental model that accelerates text generation up to four times faster on dedicated GPUs by generating entire blocks of text simultaneously instead of word-by-word. Designed for speed-critical local workflows like real-time editing and interactive applications, it trades some quality for performance but enables new use cases such as non-linear text generation. The model is released under an Apache 2.0 license and targets researchers and developers optimizing for low-latency, local inference.