one year on
Meta releases Llama 4 Scout, Maverick, and Behemoth, claims 10-million-token context and top-tier benchmark scores
Meta drops three new MoE models on a Saturday, including a 10M-token-context Scout that fits on a single H100, amid heightened pressure from DeepSeek's open-weight success.
Meta released Llama 4 on a Saturday — a surprise drop that includes Scout (17B active parameters, 16 experts, 10M-token context), Maverick (17B active, 128 experts, 400B total), and a preview of Behemoth (288B active, 2T total), which is still training.
Meta claims Scout fits on a single Nvidia H100 with INT4 quantization, delivers the best multimodal results in its class, and beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1. Maverick, the company says, beats GPT-4o and Gemini 2.0 Flash, and matches DeepSeek V3 on reasoning and coding at less than half the active parameters. An experimental chat version scored 1417 ELO on LMArena. Behemoth, distilled into the smaller models, is said to outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks.
The release comes days after reports that Meta scrambled to respond to DeepSeek’s open-weight models. The Llama 4 license prohibits use by EU-based companies or those with over 700 million MAUs without a special license.
The Saturday timing, combined with the Maverick LMArena score coming from a non-standard variant, has already sparked debate in developer circles about whether the benchmarks reflect a shipping product or a research preview.
The record
Developers expressed surprise at the Saturday release and scrutiny of Maverick's LMArena score, which came from an experimental chat variant not the base model, sparking benchmark controversy.
One year later — open only if you can handle spoilers
Within days, the Maverick LMArena score drew criticism after it emerged the score came from an unreleased chat-tuned variant. The Saturday release was seen as a rushed answer to DeepSeek, and doubts about benchmark transparency contributed to a perception that Llama's momentum had slipped.