Flash News

Ollama Expands Blackwell GPU Cluster

Ollama has announced a significant increase in the deployment of NVIDIA Blackwell GPUs, focusing on optimizing the inference performance of the GLM-5.1 model in the cloud, while continuously adding GPU capacity for other models on a daily basis.

Users can directly invoke the following commands: Claude Code uses ollama launch claude --model glm-5.1:cloud; Codex App uses ollama launch codex-app; Hermes Agent uses ollama launch hermes --model glm-5.1:cloud; or directly ollama run glm-5.1:cloud.

In market mechanisms, developers and enterprises are accelerating the shift from local deployments to Ollama's efficient cloud inference services, with funding flowing from self-built GPU clusters to hosted cutting-edge open-source model platforms. This expansion drives capital concentration towards infrastructure supporting high-performance models like GLM-5.1, putting pressure on traditional local inference solutions.

Source: Public Information

ABAB AI Insight

Ollama, as an open-source local large model operating platform, has shifted from a purely local tool to a cloud hybrid service over the past year. This large-scale expansion of Blackwell continues its rapid follow-up on top open-source models (including Llama, Qwen, and the GLM series), completing cloud optimization and command-line integration within days of model releases.

In terms of capital flow, Ollama is continuously procuring Blackwell GPUs and expanding daily, shifting computational resources from users' local hardware to centralized cloud clusters. The motivation is to lower the barrier for developers to access cutting-edge models like GLM-5.1 while building a long-term subscription and usage data loop through cloud services.

Similar cases include Groq's rapid chip adaptation after the release of Llama and Together AI's cloud-scale deployment of open-source models. The current open-source large model inference industry is undergoing a transformation from local operation to a hybrid control of cloud and local, with Ollama consolidating developer entry through one-click launch commands.

Essentially, this represents a technological substitution: developers' self-built or local GPU inference is being replaced by efficient cloud-hosted clusters. The underlying mechanism is the significant improvement in energy efficiency and performance of the Blackwell architecture combined with the scale of the GLM-5.1 model. Only through centralized procurement and optimization can low-latency, high-concurrency services be achieved, allowing small and medium teams to use top open-source models without barriers, resulting in a structural shift from dispersed hardware to concentrated inference infrastructure.

ABAB News · Cognitive Law

It's not about who buys the most GPUs, but who can let developers launch top models in a second.
The faster the cloud expansion, the easier it is for open-source models to render local hardware obsolete.
When command lines replace server setups, the pricing power of AI infrastructure shifts to the most convenient party.

Source

·ABAB News

05/15/2026, 05:05 AM·

2 min read

·2d ago