AI

WEKA weaves memory-boosting architecture into GPU AI clouds

By Dan O'Shea Jul 11, 2025 11:22am

GPU memory cloud storage artificial intelligence (AI)

If the AI ecosystem has learned anything from 2025, it's this: It’s not how many GPUs you have, but how you use them.

Do you need a lot of GPUs for AI training and inference? Yes. Can you find ways to maximize efficient use of compute resources in ways that could allow you to buy fewer GPUs? Also, yes.

With that in mind, WEKA, a company that counts Nvidia, Qualcomm, Micron, and other big names as investors, recently unveiled NeuralMesh Axon, a system that “fuses with GPU servers and AI factories to streamline deployments, reduce costs, and significantly enhance AI workload responsiveness and performance, transforming underutilized GPU resources into a unified, high-performance infrastructure layer,” the company said.

According to WEKA, memory storage latency creates a bottleneck that limits performance for large language model (LLM) training and inference workloads. But, the company is not looking to replace local GPU storage. Instead, its Augmented Memory Grid capability, part of its NeuralMesh storage system, engages with the local storage on GPUs to create a software-defined storage layer (or pool, or architecture–all three terms could apply) for specifical AI workloads.

“By increasing GPU utilization, we help customers maximize their existing AI and data infrastructure investments and lower their cloud storage costs,” WEKA told Fierce Electronics via e-mail. Increasing token throughput recently was called out by Nvidia CEO Jensen Huang as a critical area of focus as AI reasoning evolves, and WEKA said it can improve token performance by as much as 20%.

While WEKA’s technology will not be generally available until later this year, the company said early customers include AI cloud and stock market darling CoreWeave, Cohere, Stability AI, Physical Intelligence, Together AI, Applied Digital, and The Center for AI Safety, among others.

“We're entering an era where AI advancement transcends raw compute alone—it's unleashed by intelligent infrastructure design. CoreWeave is redefining what's possible for AI pioneers by eliminating the complexities that constrain AI at scale,” said Peter Salanki, CTO and co-founder at CoreWeave. “With WEKA's NeuralMesh Axon seamlessly integrated into CoreWeave's AI cloud infrastructure, we're bringing processing power directly to data, achieving microsecond latencies that reduce I/O wait time and deliver more than 30 GB/s read, 12 GB/s write, and 1 million IOPS to an individual GPU server. This breakthrough approach increases GPU utilization and empowers Cohere with the performance foundation they need to shatter inference speed barriers and deliver advanced AI solutions to their customers.”

Autumn Moulder, vice president of engineering at Cohere, added, “For AI model builders, speed, GPU optimization, and cost-efficiency are mission-critical. That means using less hardware, generating more tokens, and running more models—without waiting on capacity or migrating data. Embedding WEKA's NeuralMesh Axon into our GPU servers enabled us to maximize utilization and accelerate every step of our AI pipelines. The performance gains have been game-changing: Inference deployments that used to take five minutes can occur in 15 seconds, with 10 times faster checkpointing. Our team can now iterate on and bring revolutionary new AI models, like North, to market with unprecedented speed."

As suggested by Nvidia’s investment in WEKA, it only sees this kind of efficiency improvement as beneficial to its GPU business. When NeuralMesh becomes generally available this fall, it is expected that it will be certified to work with Nvidia’s DGX SuperPOD system through WEKA’s WEKApod appliances.

“AI factories are defining the future of AI infrastructure built on NVIDIA accelerated compute and our ecosystem of NVIDIA Cloud Partners,” said Marc Hamilton, vice president of solutions architecture and engineering at NVIDIA. "By optimizing inference at scale and embedding ultra-low latency NVMe storage close to the GPUs, organizations can unlock more bandwidth and extend the available on-GPU memory for any capacity. Partner solutions like WEKA’s NeuralMesh Axon deployed with CoreWeave provide a critical foundation for accelerated inferencing while enabling next-generation AI services with exceptional performance and cost efficiency.”

GPU memory cloud storage artificial intelligence (AI) AI