DeepSeek releases R1, an open-weights reasoning model matching OpenAI o1 on math and code at a fraction of the cost

The Chinese AI lab's MIT-licensed model, trained for an estimated $5.6 million, uses pure reinforcement learning post-training and gives the open-source community a reasoning rival to proprietary leaders.

DeepSeek today released R1, a reasoning model that matches OpenAI’s o1 on math and code benchmarks while costing a fraction to train and run. The model weights and a technical report are published under the MIT License, permitting free commercial use and distillation.

R1 is built using large-scale reinforcement learning in post-training, a recipe that minimized labeled data while delivering strong reasoning performance. The lab reports training costs of roughly $5.6 million, compared to the hundreds of millions believed to have been spent on comparable proprietary models. API pricing is set at $0.55 per million input tokens (cache miss) and $2.19 per million output tokens, again undercutting OpenAI by a wide margin.

Alongside the full R1, DeepSeek open-sourced six smaller distilled models, with the 32B and 70B versions said to be on par with OpenAI o1-mini. The release is already circulating in AI research channels, where the key debate this week is whether this level of performance from a relatively tiny budget is a fluke or a signal that the frontier of reasoning has cracked open for everyone.

The record

DeepSeek API Docs: DeepSeek-R1 Release

One year later — open only if you can handle spoilers

DeepSeek-R1 became a catalyst for the 'DeepSeek moment', spurring widespread copying of its pure RL recipe and accelerating the open-weights reasoning race. Within months, several labs had reproduced or surpassed R1's results using similar methods.

Replay thisPost on X Reddit HN LinkedIn