Data Science Insights, Trends, and Applications

Nvidia can boost DeepSeek R1’s speed 30x, says Jensen Huang

Nvidia aims to enhance the performance of DeepSeek’s artificial intelligence program R1, striving for 30 times faster processing speeds. This announcement was made by co-founder and CEO Jensen Huang during a event at the SAP Center in San Jose, California.

What is Nvidia Dynamo?

To address investor concerns stemming from the emergence of DeepSeek’s R1 in January, which had previously prompted a stock market selloff, Nvidia has introduced new software called Nvidia Dynamo. This software allows for the distribution of AI inference tasks across up to 1,000 Nvidia GPUs, increasing query throughput significantly.

According to Ian Buck, Nvidia’s head of hyperscale and high-performance computing, “Dynamo can capture that benefit and deliver 30 times more performance in the same number of GPUs in the same architecture for reasoning models like DeepSeek.” This software, now available on GitHub, enables more efficient processing by breaking up tasks to run in parallel, resulting in enhanced performance and revenue generation.

For inference tasks priced at $1 per million tokens, the increased throughput means that more tokens can be processed each second, thereby boosting revenue for GPU service providers. Buck explained that AI factories—large-scale operations utilizing Nvidia’s technology—can now offer premium services at higher rates while also increasing the overall token volume of their operations.

Nvidia RTX Pro Blackwell series pack insane power for AI and 3D work

Utilizing Dynamo with Nvidia’s Blackwell GPU model allows data centers to potentially generate 50 times more revenue compared to earlier GPU models, like Hopper. Nvidia has also introduced its version of DeepSeek R1 on HuggingFace, optimizing it by reducing the bit level for variable manipulation to “FP4,” or floating-point four bits, which minimizes computational needs compared to standard floating-point formats.

“It increases the performance from Hopper to Blackwell substantially,” Buck stated. This modification maintains the accuracy of the AI model while enhancing processing efficiency.

In addition to unveiling Dynamo, Huang showcased the latest iteration of Blackwell, referred to as Blackwell Ultra, which upgrades features of the original Blackwell 200. Notable improvements include an increase in DRAM memory from 192GB to 288GB of HBM3e high-bandwidth memory. When paired with Nvidia’s Grace CPU chip, up to 72 Blackwell Ultras can be integrated into the NVL72 rack-based computer, enhancing inference performance running at FP4 by 50% over the existing NVL72 system.

Featured image credit: Nvidia