AI

AMD’s Roane: Power efficiency’s the big AI bugaboo

OpenAI released two open-weight language models in August following a year of advances in AI models from various vendors that have sent developers scurrying to create new applications to take to market or use for business or personal tasks.

One feature of the releases caught the eye of developers in the edge space, namely the GPT-OSS-20b model, which OpenAI said could run on devices with just 16GB of memory, adding it is “ideal for on-device use cases, local inference or rapid iteration without costly infrastructure.”  Generally, 16GB is the amount of RAM ideal for heavy multitasking in desktop computers, laptops, gaming smartphones and mini-PCs.

The other model OpenAI released was for bigger jobs, the GPT-OSS-120b.  It runs efficiently on a single 80GB GPU.  Both models are available under the Apache 2.0 license.

AMD has been aggressive in evaluating open models, and is considered a distant second to Nvidia in selling accelerator chips, mainly with its MI350 Series GPUs.  But distant second in sales is not the same thing as ranking in scientific insight. 

AMD had an early look at both the new models from OpenAI, a month before the release, said Ramine Roane, corporate vice president of AI, in an interview with Fierce. “We’re in constant communication with OpenAI,” he said.  Even so, he said AMD tries to be “as broad as can be” with other players in the space, including Google, Meta and Microsoft.

He did some testing with the new OpenAI products and created a graphical chess program with the 120b model on a cloud server using MI350.  “It would beat me if I made a bad move,” he said with a smile. Roane is ranked a 1700 chess player, which is generally considered a strong Class B player-- much better than average but not an expert, and ranked in the top 15% of all players.

For edge applications, Roane said he can foresee using the 20b model, potentially, as a local language model in a new car to help the buyer consult the manual to figure out how in-car devices and features work.  “You could talk to it and ask how the heck do I do this,” he said. (Ah, so practical, and not a non-typical problem for new car owners.)  He was less convinced the smaller model would come in handy for adding AI to, say, a smart camera equipped with sensors in a factory, however.

The quest for power efficiency in future AI

Asked what the next major innovations in AI will be, Roane said large language models need to be made more efficient “to unleash more capabilities and more perceived intelligence with less power consumption. That’s the ultimate goal.”  Power consumption is on the minds of all the big AI players, of course, and Roane has figured out a good way to compare what’s needed in efficiency.

“Today, supercomputers can solve problems more or less like humans, but it is with 1,000 watts and our is brain is just 20 watts. We really need to get down to that level,” Roane said. Although he admitted it may be an unfair comparison, because imagine what could 50 people at 20 watts each accomplish? he mused.

The way to get to greater efficiency will be through a multi-pronged approach of better computer architectures, better algorithms, and chip fabrication and packaging, including the use of chiplets, he added. However, “there’s not an awful lot we can do at the transistor level,” he warned. With the use of chiplets, AMD has taken the inferring model from 32 bit to 16 bit to 8 bit and down to 4 bit. “Every time, we doubled operations.”

Within the heading of improvements to algorithms, attention transformers offer promise, he said. Attention transformers are a kind of neural network architecture that uses what is known as an “attention mechanism” to weigh the value of different parts of input data to ultimately help the model capture context and dependencies.  But the big next advances in AI, Roane said, “are going to be all these improvements, because you cannot beat Moore’s Law. This means software, hardware, architecture and algorithm.”

Materials advances like silicon photonics in making chips, will make a difference and “obviously” quantum computing will advance AI, Roane said while adding the admonition, “quantum computing in my opinion is much longer term, 10 years.”

While AMD has just 6% of the GPU market, Roane also touts advantages over Nvidia GPUs. “We have quite more bit memory and memory bandwidth and we do 3D IC’s with more HBM.  Nvidia doesn’t have that, at least yet. LLM needs memory and memory bandwidth and we have more bandwidth. Also, our software is open source.”