Flash News

NVIDIA Releases Full Weights for Cosmos-Reason2-32B Physical AI Inference Visual Language Model

NVIDIA has released the full weights for the Cosmos-Reason2-32B physical AI inference visual language model (VLM), which was previously only available in 2B and 8B smaller versions.

The model is based on the Qwen3-VL-32B-Instruct foundation, supporting multi-modal inputs of images, videos, and text, with new features for object detection and precise timestamp localization. The context window has been expanded to 256K tokens and is available for commercial use under the NVIDIA Open Model License.

It is primarily used for analyzing urban industrial video streams, batch labeling sensor data, and providing planning and reasoning capabilities for humanoid robots and autonomous vehicles.

Source: Public Information

ABAB AI Insight

NVIDIA's release of the full 32B flagship weights follows the previous launch of smaller parameter versions of the Cosmos Reason 2 series at the end of last year, continuing its physical AI strategy from Isaac Lab to the GR00T robot platform. Earlier, it had built a world model foundation through the Cosmos Predict series, accelerating community iterations in physical intelligence.

In terms of capital strategy, NVIDIA uses Qwen3-VL-32B-Instruct as a foundation for post-training physical common sense and embodied reasoning, distributing open weights via Hugging Face and NVIDIA NIM inference containers. The motivation is to capture the AI brain market for robotics and autonomous driving, while attracting developer ecosystems and enterprise deployments through open-source licensing, securing hardware sales and enterprise service revenue.

This is similar to the extension of Meta's Llama series or OpenAI's early open-source strategies in the robotics field, or Tesla Optimus's internal reliance on embodied models. Cosmos-Reason2 is currently in the expansion phase from laboratory prototypes to large-scale commercial deployment, focusing on building competitive barriers in humanoid robots and autonomous driving planning modules.

Essentially, this represents a technological replacement: by providing high-parameter embodied reasoning VLMs as open-source alternatives to traditional rule-based or small specialized models, the mechanism combines large-scale video-physical data fine-tuning with long context capabilities, allowing robotic systems to leap directly from perception to causal reasoning, accelerating the concentration of AI decision-making power in the real physical world within the NVIDIA ecosystem.

Nvidia

Source

·ABAB News
·
2 min read
·13d ago
分享: