AI’s insatiable appetite for memory

The biggest challenge posed by AI training is in moving the massive datasets between the memory and processor. The post AI’s insatiable appetite for memory appeared first on EDN.

AI’s insatiable appetite for memory

The term “memory wall” was first coined in the mid-1990s when researchers from the University of Virginia, William Wulf and Sally McKee, co-authored “Hitting the Memory Wall: Implications of the Obvious.” The research presented the critical bottleneck of memory bandwidth caused by the disparity between processor speed and the performance of dynamic random-access memory (DRAM) architecture.

These findings introduced the fundamental obstacle that engineers have spent the last three decades trying to overcome. The rise of AI, graphics, and high-performance computing (HPC) has only served to increase the magnitude of the challenge.

Modern large language models (LLMs) are being trained with over a trillion parameters, requiring continuous access to data and petabytes of bandwidth per second. Newer LLMs in particular demand extremely high memory bandwidth for training and for fast inference, and the growth rate shows no signs of slowing with the LLM market size expected to increase from roughly $5 billion in 2024 to over $80 billion by 2033. And the growing gap between CPU and GPU performance, memory bandwidth, and latency is unmistakable.

The biggest challenge posed by AI training is in moving these massive datasets between the memory and processor, and here, the memory system itself is the biggest bottleneck. As compute performance has increased, memory architectures have had to evolve and innovate to keep pace. Today, high-bandwidth memory (HBM) is the most efficient solution for the industry’s most demanding applications like AI and HPC.

History of memory architecture

In the 1940s, the von Neumann architecture was developed and it became the basis for computing systems. The control-centric design stores a program’s instructions and data in the computer’s memory. The CPU fetched instructions and data sequentially, creating idle time while the processor waited for these instructions and data to return from memory. The rapid evolution of processors and the relatively slower improvement of memory eventually created the first system memory bottlenecks.

Figure 1 Here is a basic arrangement showing how processor and memory work together. Source: Wikipedia

As memory systems evolved, memory bus widths and data rates increased, enabling higher memory bandwidths that improved this bottleneck. The rise of graphics processing units (GPUs) and HPC in the early 2000s accelerated the compute capabilities of systems and brought with them a new level of pressure on memory systems to keep compute and memory systems in balance.

This led to the development of new DRAMs, including graphics double data rate (GDDR) DRAMs, which prioritized bandwidth. GDDR was the dominant high-performance memory until AI and HPC applications went mainstream in the 2000s and 2010s, when a newer type of DRAM was required in the form of HBM.

Figure 2 The above chart highlights the evolution of memory in more than two decades. Source: Amir Gholami

The rise of HBM for AI

HBM is the solution of choice to meet the demands of AI’s most challenging workloads, with industry giants like Nvidia, AMD, Intel, and Google utilizing HBM for their largest AI training and inference work. Compared to standard double-data rate (DDR) or GDDR DRAMs, HBM offers higher bandwidth and better power efficiency in a similar DRAM footprint.

It combines vertically stacked DRAM chips with wide data paths and a new physical implementation where the processor and memory are mounted together on a silicon interposer. This silicon interposer allows thousands of wires to connect the processor to each HBM DRAM.

The much wider data bus enables more data to be moved efficiently, boosting bandwidth, reducing latency, and improving energy efficiency. While this newer physical implementation comes at a greater system complexity and cost, the trade-off is often well worth it for the improved performance and power efficiency it provides.

The HBM4 standard, which JEDEC released in April of 2025, marked a critical leap forward for the HBM architecture. It increases bandwidth by doubling the number of independent channels per device, which in turn allows more flexibility in accessing data in the DRAM. The physical implementation remains the same, with the DRAM and processor packaged together on an interposer that allows more wires to transport data compared to HBM3.

While HBM memory systems remain more complex and costlier to implement than other DRAM technologies, the HBM4 architecture offers a good balance between capacity and bandwidth that offers a path forward for sustaining AI’s rapid growth.

AI’s future memory need

With LLMs growing at a rate between 30% to 50% year over year, memory technology will continue to be challenged to keep up with the industry’s performance, capacity, and power-efficiency demands. As AI continues to evolve and find applications at the edge, power-constrained applications like advanced AI agents and multimodal models will bring new challenges such as thermal management, cost, and hardware security

The future of AI will continue to depend as much on memory innovation as it will on compute power itself. The semiconductor industry has a long history of innovation, and the opportunity that AI presents provides compelling motivation for the industry to continue investing and innovating for the foreseeable future.

Steve Woo is a memory system architect at Rambus. He is a distinguished inventor and a Rambus fellow.

Special Section: AI Design

The post AI’s insatiable appetite for memory appeared first on EDN.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow