Google Officially Open Sources Diffusion Architecture Model DiffusionGemma
Google has officially open-sourced the diffusion architecture model DiffusionGemma, which differs from the Transformer’s word-by-word generation "typewriter" mode by generating large sections or entire pieces of content at once, followed by iterative optimization.
The model achieves generation speeds of over 1000 tokens/s on H100 and over 700 tokens/s on RTX 5090; the 26B parameter version requires only 18GB of VRAM and can generate 256 tokens in parallel at once.
Its core advantage lies in the multi-round iterative generation process, allowing for a draft to be written first, followed by proofreading and correcting typos and sentences, significantly improving output quality and coherence.
Source: Public Information
ABAB AI Insight
Google has previously delved deeply into the field of diffusion models, and the open-sourcing of DiffusionGemma continues its exploration from Transformer dominance to a new generation of generative paradigms. The Gemma series has rapidly iterated and opened up to the community to accelerate ecosystem development.
In terms of capital strategy, Google mobilizes research resources to push DiffusionGemma to open source, lowering the training threshold for developers and accelerating model iteration through community feedback, while also accumulating technical reserves for closed-source products like Gemini, forming a capital cycle of synergy between open-source ecosystems and commercial products.
Similar to the early evolution of Transformers from papers to large-scale open source, DiffusionGemma is currently in the expansion phase of transitioning from research prototypes to mainstream generative tools, challenging the dominance of traditional autoregressive architectures with high speed and self-iterative capabilities.
Essentially, this represents a technological replacement and capital concentration: the diffusion architecture's ability to generate and optimize iteratively directly replaces the Transformer’s word-by-word locking mode, significantly enhancing speed and quality, accelerating the concentration of AI generative capital from traditional large models to efficient diffusion platforms, and reshaping the efficiency structure and development paradigm of content creation, code generation, and multimodal tasks.
ABAB News · Cognitive Laws
The more parallel the generation method, the more speed and quality can be achieved.
The more thorough the iterative corrections, the lower the output locking risk.
The more innovative the architectural paradigm, the higher the computational resource utilization efficiency.