How DeepSeek’s V3.2 changes everything about AI scaling | by Enrique Dans | Enrique Dans

IMAGE: Abstract digital artwork of a neural network: the left side shows blue nodes densely interconnected, while the right side features red nodes sparsely connected, with the words “DeepSeek Sparse Attention” integrated into the red side

Reflecting the real-time evolution of AI, a growing range of models are being developed in the United States, where companies are locked in a race based on the use of ever-more powerful processors in vast data centers. Meanwhile, Chinese companies deprived of access to the latest chips are innovating, allowing them to punch above their weight and potentially changing the game.

First came DeepSeek in January, more powerful than any of the US models at the time and trained at a much lower cost, and now, barely nine months later, comes DeepSeek-V3.2-Exp, an experimental model that builds on its previous architecture.

What’s relevant about this announcement is not so much the numerical leap in the release, but the introduction of a fascinating experimental mechanism called DeepSeek Sparse Attention (DSA) aimed at drastically improving efficiency in both training and inference, especially in long-context scenarios. The company has also accompanied this launch with a drastic price reduction in its API (50% less), a strategy that, combined with its open nature, poses a…

Source link