Chinese AI startup DeepSeek has introduced DSpark, a speculative decoding framework that boosts inference speed for its V4 models by as much as 85%. The upgrade was also tested on Gemma and Qwen models.
The framework represents a significant push in the competitive AI landscape, where inference efficiency is a key battleground. Faster inference can lower costs and enable more responsive applications, particularly in real-time settings.
DeepSeek claims the speed gains reach up to 85%, though it did not disclose exact benchmarks or the conditions under which these results were achieved. Testing included models from Google's Gemma and Alibaba's Qwen families.
This development could pressure rivals to accelerate their own inference optimization efforts, especially as demand for large language model deployments grows. Enterprises running DeepSeek's V4 may see reduced latency and infrastructure costs.
Industry analysts note that speculative decoding is an active research area, and such claims require independent verification to confirm real-world performance gains.