Flash News

Sapient Intelligence Releases Open Source 1B Parameter HRM-Text Model, Training Cost Approximately $1472

Sapient Intelligence has released the open-source 1 billion parameter text generation base model HRM-Text, based on the Hierarchical Reasoning Model (HRM) architecture. By introducing latent space reasoning at the lower level, it reduces pre-training computational costs by 130 to 600 times.

The model completed pre-training using only 4 billion structured tokens, with data volume being only one-thousandth of that of conventional models at the same level. The 1B version was trained in about 46 hours on two 8-card H100 servers, costing approximately $1472; the 0.6B version took 50 hours on a single node, costing about $800.

In the market, open-source model developers and small teams are accelerating the adoption of low-cost pre-training solutions. Sapient Intelligence breaks down computational barriers through the HRM architecture, benefiting independent researchers and startup teams while putting pressure on large model laboratories that rely on massive computational power. Funding is rapidly concentrating on efficient new architectures and low-threshold training frameworks.

Source: Public Information

ABAB AI Insight

Sapient Intelligence previously focused on exploring hierarchical reasoning architectures. The HRM-Text is its first open-source base model, continuing the path of breaking the traditional Transformer single-scale limitation through Dual-timescale recurrent design. Similar hierarchical/cyclic architectures have mostly remained in theoretical stages.

On the capital path, the team focuses core innovations on the architectural level rather than merely stacking computational power and data. By open-sourcing a complete engineering framework (data extraction, sequence packaging, PyTorch distributed), they attract community contributions, motivated to validate previously shelved model theories at extremely low costs while quickly accumulating real user feedback and ecosystem, forming a lightweight R&D cycle from architectural innovation to community co-construction.

Similar to how Mistral initially entered the open-source market with efficient architecture, and DeepSeek achieved cost reduction through architectural optimization, Sapient Intelligence currently positions HRM at the forefront of the transition from "scaling up" to "architectural leap" in foundational models, driving the industry from a data + computational power arms race to a stage of intelligent architectural innovation.

Structural judgment: Essentially a technological substitution. HRM alternates iterations and state additions through fast and slow Transformer modules, dynamically expanding computational depth under fixed parameter amounts, replacing pre-training driven by massive data + computational power with efficient structured reasoning. The mechanism significantly enhances information utilization efficiency through latent space and dual time-scale design, forcing the value of model development to concentrate from capital-intensive to architecture-innovative.

ABAB News · Cognitive Law

The smarter the architecture, the cheaper the computational power.
Data one-thousandth, effect hundredfold validation.
The theories buried by computational power in the past will eventually be revived by architecture.

Source

·ABAB News
·
3 min read
·1d ago
分享: