As AI and HPC Converge, Hardware Evolves

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

SANTA CLARA, CALIF. — AI revolutionized scientific computing in the last few years, and the workloads are continuing to converge, Ian Buck, VP and general manager of Nvidia’s hyperscale and HPC computing business, told EE Times. This means a booming market for GPUs in high-performance computing (HPC).

“The supercomputing and HPC world now realize the potential AI has,” Buck said. “The good news is, [AI is accelerated by] the same GPUs…very intentionally, Nvidia makes one architecture and makes it available to all our markets, all of our users.”

Supercomputers continue to accelerate scientific discovery in many fields. Physics-based models are used today to simulate phenomena that are difficult to observe experimentally. Climate change is a great example—the scale is so large, and the timescales are so long that it’s hard to design a practical experiment to test climate scientists’ hypotheses. Instead, simulation must be used.

“Assuming we can build the right computer, when we know the physics involved, from turbulent flow to solar radiation, we can build a mathematical model of the Earth and then push play, and play with it,” he said. “It can happen over timescales of years, or decades, or centuries.”

Using a supercomputer, scientists can simulate carbon emissions over time and see the outcome.

Ian Buck (Source: Nvidia)

“The challenge is accuracy, and having enough compute cycles in the computer,” he said. “There’s always a question in supercomputing: have we gone to a fine enough resolution to capture the phenomenon, because it’s impractical for us to simulate all the way down to the atom, so we have to approximate, then validate.”


For work on climate change, models can be tested on historical data, but it’s still hard at Earth scale. Cloud formation needs to happen at sub-kilometer scale to capture eddies, often in the hundreds of meters scale. Scientists can improve the accuracy of the simulation with more compute cycles, but when this isn’t possible, another option is to build and train an AI to watch the simulation and approximate it.

“The AI can run much faster [than the original simulation algorithm],” he said. “It’s still very much an approximation that needs to be validated and tested, but it can be a tool for researchers to explore many more options at much bigger timescales and identify phenomena that might be too difficult to compute or too difficult to find by searching all the different options, and then go back and follow up by running first-principles physics simulations.”

Many supercomputers being built today with Nvidia’s Grace Hopper CPU-GPU and Hopper GPUs will be used to train and infer AI surrogates, Buck said. Nvidia has its own project to build a supercomputer called Earth-2 that will run a digital twin of the Earth for climate research. Earth-2 will use a combination of GH200 (Grace Hopper), HGX H100 (Hopper AI GPUs) and OVX (Ada Lovelace GPUs for graphics and AI) systems.

Nvidia’s Earth-2 supercomputer will run a digital twin of the planet at unprecedented resolution to help advance climate change research. (Source: Nvidia)

AI surrogates are also used at the molecular level for protein folding and the investigation of how biological molecules like viruses work. The process of intercepting viruses with drugs is difficult to simulate because of the relatively short time steps required over a relatively long total time, but an AI-based surrogate model of this process can help speed up simulation.

Protein folding is one of several areas of scientific computing that can make use of AI surrogates to speed up scientific discovery. (Source: Nvidia)

AI can also be used to accelerate existing processes and methods. AI-based preconditioners, which can help scientists solve mathematical equations faster, are also gaining ground in many different scientific applications.

“Often the trick is to convert the matrix of equations to something that’s easier to solve using a preconditioner—converting equations to a different space which may be structured more efficiently for numerical solvers to solve,” Buck said. “That’s an art. If you can do it, you can build a preconditioner that can solve linear equations much faster but is still 100% numerically accurate.”

Preconditioners are frequently used for computer-aided engineering (CAE) in applications like vehicle crash analysis.

Nvidia offers workflows for building AI surrogates, plus the Modulus software package for physics-informed training of AI models, and some types of foundation models like BioNeMo for drug discovery.

The biggest changes coming to supercomputing hardware will change the way CPUs and GPUs connect with each other, Buck added.

“We’re at a point now where AI and accelerated computing in general means we can think outside the box in terms of the way computers can be built,” he said.

While supercomputers might’ve had a CPU or two connected to an accelerator over PCIe in recent years, more integrated solutions like Nvidia’s Grace Hopper superchip are emerging.

“Now that the market has become so large, we can move from a 60 or 100 GB/s connection to a much tighter integration between CPU and GPU, which is what Grace Hopper offers—a CPU and GPU that operate together, as one,” he said, noting that Grace Hopper’s CPU to GPU bandwidth is 450 GB/s in one direction or 900 GB/s total.

Nvidia also designed Grace Hopper to be fully cache-coherent.

“In the past, people optimized highly valued code for moving data back and forth, and that will continue,” he said. “But when you put the [CPU and GPU] next to each other so they can really operate as one and the GPU has the same bandwidth to the host memory on the CPU as the CPU does, you’re really building a 600-GB fully coherent GPU—now they can think less about data movement, they can let the operating system move pages around dynamically, and it can do so very efficiently and very fast.”

Given Nvidia’s dominant position in large-scale AI, it’s fair to say that speed and efficiency improvements in products like Grace Hopper will contribute to scientific discoveries that will change the world for the better.

Source link