AI

Hailo-10H debuts with GenAI focus for on-device processing

By Dan O'Shea Jul 23, 2025 12:53pm

Hailo announced the commercial availability of the Hailo-10H, its second-generation generative AI accelerator aimed at on-device processing of large language models and visual language models at the edge.

The announcement comes after HP earlier this year said that its new M2 card for point-of-sale devices would incorporate the Hailo-10H to support cashierless checkout, theft prevention, and other applications in the retail and hospitality industries. In addition to those markets, Hailo said it also is targeting the new processor at opportunities in the personal computing, automotive, telecommunications, security sectors, and others.

Hailo CTO Avi Baum told Fierce Electronics via e-mail that it is “too early to disclose” names of other customers beyond HP, but that the company is fielding interest in its new accelerator. “With regards to use cases, we are receiving inquiries from a wide range of industries and end products. Currently we are focusing on a variety of use cases that require native language operation (for example retail checkout, home assistance, car infotainment system, etc.) or combine video and language such as video and image indexing, summarization, captioning, etc.”

The company said that in performance benchmark tests, the Hailo-10H has achieved a first-token latency of under 1 second and over 10 Tokens per Second on a variety of 2B language and vision-language models. Baum added, “The benchmarks we reported were for QWEN VL which is a leading VLM model, and multiple LLMs including Llama, DeepSeek, and QWEN.”

Nvidia CEO Jensen Huang and others have spoken extensively this year about the growing importance of token latency and optimization of token processing as language models continue to grow and the AI market shifts focus from pure training needs to inference and reasoning.

Regarding first-token latency, Baum explained, “It boils down to more specific tasks that are using underlying LLM or VLM. First-token latency is important in cases in which the outcome is concise, sometimes a simple ‘yes/no’ or action triggering, for example, in agentic AI. In these cases the time to first-token latency is fundamentally critical, because of the inherent interactive nature of the application.”

Meanwhile, for video analytics, Hailo said the Hailo-10H enables state-of-the-art object detection (e.g., YOLOv11m) on a real-time 4K video stream. The 2.5W of the processor makes it a strong fit for compact, efficient AI-enabled systems, the company added. The Hailo-10H also is automotive-qualified to AEC-Q100 Grade 2 standards and is aimed at automotive designs with 2026 start of production.

Hailo edge AI large language model Generative AI (GenAI) AI IoT / Connectivity Sensors Fusion