Arm Lumex Brings Cloud-Level AI Power to Smartphones

Credit: Arm

Arm is betting the future of mobile AI lies in the device and not exclusively in the cloud. Its new Lumex platform, first introduced in May, is built to prove it. Part of the company’s Compute Subsystems (CSS), it combines Arm’s high-performance CPUs with its second-generation Scalable Matrix Extension (SME2), GPUs, system IP, and an optimized software stack. This solution intends to bring developer-friendly AI to smartphones and tablets, while also delivering faster and more efficient local processing.

“AI hasn’t just become more powerful. It’s become more personal,” Chris Bergey, Arm’s senior vice president and general manager of its client line of business, remarks during a news conference on Monday. “It now understands, adapts, and reacts in real-time, all without ever needing to leave your device, whether it’s streamlining your workflow, helping you communicate across languages, or anticipating what you need before you ask. AI is shifting from a tool to a companion, and expectations are growing, defining consumer choices.”

Subscribe to The AI Economy

This shift is driven by sophisticated large language models (LLMs) and AI agents. Bergey explains that they are no longer static and are capable of reasoning, planning, and undertaking action on our behalf. “We have moved from AI being a parlor trick to influencing how things get done,” he says. People will soon expect every device “to understand the natural voice, anticipate their needs, and respond with context and intelligence. And if it doesn’t, they’ll be frustrated instantly.”

It’s this thinking that leads Arm to believe it’s time for AI to move away from the cloud. Bergey contends that cloud reliance is “unsustainable,” can be “too expensive” for developers, “too slow” for users, and “too concerning” for privacy.

To make this a reality, Arm’s new solution combines the hardware and software necessary to empower smartphones and PCs to deliver seamless and real-time experiences.

What’s Included in Arm Lumex

“With the Lumex platform, we’ve focused on the essential requirements for on-device AI, namely, high performance for real-time inferencing, a low latency to avoid unnecessary round-trips to the cloud, and energy efficiency that doesn’t compromise capability or usage,” James McNiven, Arm’s vice president of product management, says. “Lumex is more than a set of IP blocks. It’s a full-stack flagship platform designed from the ground up for AI.”

An overview of what's included in Arm Lumex. Credit: Arm
An overview of what’s included in Arm Lumex. Credit: Arm

SME2 and CPUs

The offering is equipped with two of Arm’s most powerful chips through its SME2-enabled Armv9.3 CPU cluster, the C1-Ultra and C1-Pro. Arm boasts that it provides “up to five times uplift in AI performance, 4.7 times lower latency for speech-based workloads, and 2.8 times faster audio generation.” The inclusion of SME is significant because it empowers devices to handle large-scale AI tasks locally. “SME2 brings a huge boost for AI performance and efficiency across mobile applications and on-device experiences,” McNiven explains. “It’s part of the CPU, so it’s easier to program. It’s simpler to debug. It takes on the CPU security model, and, of course, will be broadly available.”

Bergey notes that SME2 is proven to provide greater AI performance, reducing memory usage, and making on-device AI feel smoother, especially for apps with “real-time constraints like audio generation, camera inference, computer vision, or chat interactions.”

Credit: Arm
Credit: Arm

Arm estimates that SME and SME2 could deliver more than 10 billion TOPS (trillion operations per second) across over three billion devices by 2030, underscoring the growing importance of on-device AI performance.

Catering to sub-flagship devices, Arm Lumex also features the brand-new C1-Premium chip. This CPU offers similar performance levels as C1-Ultra, but for a 35 percent smaller area.

New Mali G1-Ultra GPU and Better Ray Tracing

As for graphics, Arm Lumex features the new Mali G1-Ultra GPU, which supports advanced ray tracing capabilities that deliver more realistic graphics and enhanced gaming performance. “Ultra replaces our Immortalis brand, and G1-Ultra delivers a 20 percent improvement while consuming nine percent less energy per frame,” McNiven highlights. “And for AI workloads, it enables up to 20 percent faster inference performance, enhancing that responsiveness across real-time apps.”

The performance of Arm's new Mali G1-Ultra GPU. Credit: Arm
The performance of Arm’s new Mali G1-Ultra GPU. Credit: Arm

It’s the latest chip in the Mali lineup, joining Arm’s G1-Premium and G1-Pro.

As for ray tracing, he shares that it has been redesigned, shifting from a packed rays model to a single ray model that makes it more efficient. Arm has also improved support for incoherent rays that deliver more realistic lighting, reflections, and shadows. “Unlike the first-generation ray tracing unit, which used additional instructions within the GPU’s execution engine, the new [ray tracing unit] is a fully separate hardware unit which dramatically increases its performance and it’s scalable with up to 24 Shader cores, making it adaptable for high-end mobile gaming as well.”

“With Lumex, we’re enabling the next generation of smartphones, making AI truly personal, built to adapt in real-time, optimized from the core up, and designed to put platform-level intelligence in the palm of your hand to show you exactly what makes Lumex such a great breakthrough platform,” Bergey asserts.

C1-DSU and KleidiAI Support

But new CPUs and GPUs aren’t all that’s inside Arm Lumex. It also integrates the C1-DSU (DynamIQ Shared Unit). This cluster architecture links CPU cores with shared cache and system resources, enabling faster and more efficient processing for on-device AI and demanding mobile workloads.

Arm Lumex is engineered for 3-nanometer nodes to maximize performance and energy efficiency. In other words, it’s capitalizing on the latest semiconductor technology. Doing so provides the platform with more compute power in a smaller footprint, enabling faster and more capable on-device AI.

Credit: Arm
Credit: Arm

Lastly, the platform utilizes KleidiAI libraries, first announced in 2024 and described as “broad software deliverables and community engagements,” to simplify the integration of software with AI for developers.

“With both hardware and software upgrades, [the] Arm Lumex CSS platform is far more than component-level progress,” McNiven testifies. “It is a holistic platform strategy purpose-built for the AI era and optimized across the full compute stack.”

What Does Lumex Mean For Developers?

It’s unreasonable to assume that AI will forever be tied to the cloud. There are many reasons why someone would want intelligent applications that run on-device, unreliant on internet connectivity. The promise of Arm’s vision is that by running AI workloads directly on a smartphone or tablet, it’s cheaper, faster, and more secure. It’s likely the exact reason why Microsoft and OEMs introduced the Copilot+ PCs in 2024. So, what happens after a developer decides to follow through on this idea?

According to Geraint North, an AI and developer platform fellow at Arm, today’s mobile system-on-chip landscape is “really complex.” Developers must decide where on the device these AI workloads will run—the CPU (where the majority of compute happens), the GPU (which handles graphics workloads), or the NPU (a dedicated accelerator for AI workloads). Although the GPU and NPU could offer better efficiency, they require significant code modifications, which could impact performance. Arm’s solution is to have most of the AI workload run efficiently on the CPU using SME2.

“The Arm CPU is also the only compute unit in the mobile market you can rely on to be present in every mobile phone,” he claims. “So, as you start to move to GPUs and NPUs, you end up doing different work for different handsets.”

The Race to Locally-Run AI

Arm isn’t the only technology company working to extend AI from beyond the cloud and onto devices. Apple, Qualcomm, MediaTek, Samsung LSI, and Google are among the other players developing solutions. However, what sets Arm apart from the competition is its deep-seated presence in the smartphone industry—not only is Arm a critical chipmaker, but its chip architecture underpins almost everything in the smartphone industry.

Credit: Arm
Credit: Arm

That said, Lumex isn’t Arm’s only foray into the AI era. It has other initiatives in the works, including Neoverse for data centers, Zena for automobiles, Niva for PCs, and Orbis for the Internet of Things (robots?).

“Lumex is more than our most advanced client platform. It’s the foundation for the next era of intelligent mobile computing,” Bergey declares. “Lumex truly brings together our platform vision and a journey that Arm has been on for several years now, with amazing double-digit CPU performance gains and SME2 integration, it delivers the power and efficiency needed to run modern AI workloads directly on device.”

Featured Image: Credit: Arm

Subscribe to “The AI Economy”

Exploring AI’s impact on business, work, society, and technology.

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading