The implementation of the NVIDIA DGX SuperPOD has transformed MediaTek’s AI development lifecycle. The high compute utilization required to drive this level of AI development underscores the critical need for a powerful on-premises solution to manage such extensive and continuous AI workloads.
“Our AI factory, powered by DGX SuperPOD, processes approximately 60 billion tokens per month for inference and completes thousands of model-training iterations every month,” said David Ku, Co-COO and CFO at MediaTek.
Model inferencing, particularly with cutting-edge LLMs, requires loading entire models into GPU memory. Models with hundreds of billions of parameters can easily exceed the memory capacity of a single GPU server, requiring partitioning across multiple GPUs. The DGX SuperPOD, comprising tightly coupled DGX systems and high-performance NVIDIA networking, is purpose-built to deliver ultra-fast, coordinated GPU memory and compute needed for efficient training and inference with today’s largest AI workloads.
“The DGX SuperPOD is indispensable for our inference workloads. It allows us to deploy and run massive models that wouldn’t fit on a single GPU or even a single server, ensuring we achieve the best performance and accuracy for our most demanding AI applications,” said Ku.
MediaTek leverages these large models for core research and development and for a centralized, high-demand API, subsequently distilling smaller versions for specific edge or mobile applications. This strategic and right-sized approach ensures absolute best performance and accuracy.
With the DGX platform, MediaTek has streamlined its product development pipeline by integrating AI agents into R&D workflows. For example, AI-assisted code completion has significantly reduced programming time and error rates. An AI agent built with domain-adapted LLMs helps engineers understand, analyze, and optimize designs by extracting information from design flowcharts and state diagrams as part of the chip design process. This agent can now produce technical documentation in days, compared to weeks earlier.
NVIDIA NeMo™, a software suite for building, training, and deploying large language models, is also leveraged to fine-tune these models, ensuring optimal performance and domain-specific accuracy.
