How Alibaba builds its most efficient AI model to date
A technical innovation has allowed Alibaba Group Holding, one of the leading players in China’s artificial intelligence boom, to develop a new generation of foundation models that match the strong performance of larger predecessors while being significantly smaller and more cost efficient.
Alibaba Cloud, the AI and cloud computing division of Alibaba, unveiled on Friday a new generation of large language models that it said heralded “the future of efficient LLMs”. The new models are nearly 13 times smaller than the company’s largest AI model, released just a week earlier.
Despite its compact size, Qwen3-Next-80B-A3B is among Alibaba’s best models to date, according to developers. The key lies in its efficiency: the model is said to perform 10 times faster in some tasks than the preceding Qwen3-32B released in April, while achieving a 90 per cent reduction in training costs.
Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.
Emad Mostaque, co-founder of the UK-based start-up Stability AI, said on X that Alibaba’s new model outperformed “pretty much any model from last year” despite an estimated training cost of less than US$500,000.
For comparison, training Google’s Gemini Ultra, released in February 2024, cost an estimated US$191 million, according to Stanford University’s AI Index.
Alibaba says its new generation of AI foundation models heralds the “the future of efficient LLMs”. Photo: Handout alt=Alibaba says its new generation of AI foundation models heralds the “the future of efficient LLMs”. Photo: Handout>
Artificial Analysis, a leading AI benchmarking firm, said Qwen3-Next-80B-A3B surpassed the latest versions of both DeepSeek R1 and Alibaba-backed start-up Moonshot AI’s Kimi-K2. Alibaba owns the South China Morning Post.
Several AI researchers attributed the success of Alibaba’s new model to a relatively new technique called “hybrid attention”.
Existing models face diminishing returns on efficiency as input lengths increase because of the way AI models determine which inputs are the most relevant. This “attention” mechanism involves trade-offs: better attention accuracy leads to higher computational expenses.
Those costs compound when models handle long context inputs, making it expensive to train sophisticated AI agents that autonomously execute tasks for users.
Qwen3-Next-80B-A3B addresses this challenge by incorporating a technique known as “Gated DeltaNet”, first proposed by researchers at the Massachusetts Institute of Technology and Nvidia in March.
Gated DeltaNet enhanced the model’s attention by making targeted adjustments to the input data and determining what information to retain and what to discard, said Zhou Peilin, an AI researcher at the Hong Kong University of Science and Technology.
This results in an accurate yet cost-effective attention mechanism. Citing scores from the Ruler benchmark, which evaluates AI models based on their ability to manage varying input lengths, Alibaba said Qwen3-Next-80B-A3B was comparable to its most powerful model, the Qwen3-235B-A22B-Thinking-2507, despite being smaller and more affordable.
Alibaba uses a technique known as “Gated DeltaNet” to develop its latest AI models. Photo: Handout alt=Alibaba uses a technique known as “Gated DeltaNet” to develop its latest AI models. Photo: Handout>
“It’s great to see that our DeltaNets … have been greatly scaled up by Alibaba to build excellent AI models,” said Juergen Schmidhuber, computer science professor at the King Abdullah University of Science and Technology, who contributed to the development of DeltaNets in the 1990s.
Qwen3-Next-80B-A3B also uses the “mixture-of-experts” (MoE) architecture, which has driven many efficiency gains in Chinese AI models over the past year, including DeepSeek-V3 and Moonshot’s Kimi-K2.
The MoE architecture divides a model into separate sub-networks or “experts” that specialise in subsets of input data to collaboratively perform tasks.
Alibaba enhanced the “sparsity” of its latest MoE architecture to improve efficiency. While DeepSeek-V3 and Kimi-K2 employ 256 and 384 experts, respectively, Qwen3-Next-80B-A3B features 512 experts but activates only 10 at a time.
According to Artificial Analysis, those innovations helped the model achieve parity with DeepSeek-V3.1, despite having just 3 billion active parameters compared with the latter’s 37 billion. Generally, a higher number of parameters indicates a more powerful model, but it also increases training and operational costs.
The efficiency gains are evident on Alibaba’s cloud platform, where the new model costs less to run than the Qwen3-235B-2507, which contains 235 billion parameters.
The new model architecture reflects a growing interest in smaller but more efficient AI models, amid rising concerns about the costs associated with scaling up the industry’s largest models.
According to AI research firm Epoch AI, the most expensive training run to date was xAI’s Grok 4, which cost US$490 million, with future training runs expected to exceed US$1 billion by 2027.
In August, researchers at Nvidia published a paper advocating for small language models as the future of agentic AI because of their flexibility and efficiency. The company is also experimenting with the Gated DeltaNet technique on its Nemotron models.
Meanwhile, Chinese AI giants are pushing for broader adoption of their models by ensuring they are small enough to run on laptops and smartphones.
Last month, Tencent Holdings launched four open-source AI models, each under 7 billion parameters, while Beijing-based start-up Z.ai released the GLM 4.5 Air model with just 12 billion active parameters.
Alibaba’s Qwen3-Next-80B-A3B is now compact enough to operate on a single Nvidia H200 graphics processing unit, according to Artificial Analysis. On the open-source developer platform Hugging Face, the model quickly broke into the trending leaderboard, amassing almost 20,000 downloads within 24 hours after launch.
Alibaba said its new architecture served as a preview of its next generation of AI models. The future of large language models would likely revolve around refining Alibaba’s approach to address training costs and efficiency, even if entirely different architectures emerge, said Tobias Schroder, an AI researcher at Imperial College London.