Edge AI’s memory vs. TOPS balancing act

Edge AI tech has been around for years by now but is fast growing. Among a series of much larger players,  Hailo is pitching itself as providing unmatched competitively-priced Edge AI processors that lower energy needs and costs.

Of course, the central advantage of any Edge AI architecture is advertised as the ability to reduce costs and energy consumption, but Avi Baum, CTO at Hailo and company marketing officials note its devices are used by hundreds of companies globally and have been for five years, including with its Hailo 8 processor.  The privately-held Hailo is based in Tel Aviv.

Baum  posted a recent blog “TOPS Matter, But They’re Not Enough” and described in an interview with Fierce that many customers shop for edge AI processors that boast a high TOPS (tera operations per second).  But relying on TOPS exclusively can be misleading, he said. What matters, in simple terms, is parameter count, and processors with on-chip memory can handle the load of perceptive and enhancive AI models without relying on off-chip memory.  Generative AI models have billions of parameters, so off-chip memory like DRAM becomes essential. 

 

For any AI application to run efficiently, the balance between TOPS and memory becomes workload specific. High-resolution video analytics workloads require higher computer power to handle large input  frames and high frame rates but require moderate memory bandwidth to process those frames.

However natural language processing puts far greater demands on memory than on compute.  When an input is mostly text or audio, the inference rate is at human interaction speed, memory bandwidth is the limiting factor. but when video is involved, there’s a balance between TOPS and memory bandwidth, Baum said.

In one example, Baum said the Hailo-15 AI vision processor with 20 TOPS and 32-bit LPDDR4X memory has an image signal processor that consumes 30% of memory and compute resources, while AI analytics will mainly consume compute resource. Applying a vision language model will be highly demanding on memory and less on compute.

Power, cost and latency create challenges for balancing compute and memory in an edge AI system. “Focusing solely on tops risks overlooking critical bottlenecks, just as ignoring application requirements can lead to suboptimal choices,” Baum said in his blog. “The most powerful processor isn’t always the best—it’s the one that fits your needs perfectly.”

Baum told Fierce modern AI analytics and generative AI require building a machine with use cases in mind. “The outcome is not that obvious… If you go higher with both memory and compute, it blows out in power and price.”  Notes his blog: “In generative AI use cases, adding memory resources becomes critical to maintain high performance. However, this in turn introduces not only cost increase but also latency and higher power demands. Therefore a careful balance between all resources is required.”

On its website, Hailo lists a series of customer AI applications that can benefit from Hailo processors, from ADAS to security to industrial automation and more. In retail, AI can improve a shiopping experience with product and person identification at checkout, for example.  In 2021, Hailo described “Just Walk Out” automated checkout where a customer doesn’t need to wait in a queue.

While Hailo says its products are unmatched, there are multiple large companies in the edge AI market, including Alphabet, Amazon, Intel, Microsoft and IBM.   Grand View Research said the global edge AI market was more than $20 billion in 2024 and will grow at an annual rate of 21% to reach $66 billion in 2030.