Arm Unveils Powerful New Cores And Compute Subsystems For Next-Gen AI Workloads

Arm Holdings plc, or “arm”, was once considered a vendor of processors designed for embedded and low-power systems, but those days are well past at this point. Having conquered mobile some time ago, processors based on Arm’s Instruction Set Architecture (ISA) are now challenging the supremacy of the long-dominant x86-64 architecture on every front: laptops, servers, and even client desktops.
The majority of these processors are from other arm licensees, like Qualcomm, which is what makes Arm’s ISA a bit different from the x86 ISA. Where Intel and AMD have a stranglehold on the x86-64 instruction set, Arm licenses its designs to third parties, and as such there are many companies making Arm-compatible or Arm-based processors. The ones that get the most attention in our circles are from Apple, NVIDIA, and Qualcomm, but Arm still makes and sells its own processor IP as well, and that’s what we’re looking at today.

Arm Compute Subsystems For Client Updates

arm css for client
Arguably the biggest announcement today is that Arm is bringing its “Compute Subsystems” concept to client devices. This idea originated with the company’s Neoverse processor IP for servers, and it is essentially similar to what Intel offers in its foundry services: Arm IP as building blocks for third parties to use when constructing their own systems, whether those be mobile SoCs or turnkey devices for other markets.

Arm emphasizes that it is currently offering “performance efficiency leadership,” which is true in certain parts of the market. The company also highlights updates to the Arm v9.2 ISA that includes new extensions targeted at improving AI performance and the security of the Arm platform. If you’re more of a desktop PC user, this is generally analogous to the many extensions that Intel and AMD have added to the x86-64 ISA, such as AVX-512.

arm compute ecosystem
Arm remarks that it has the “largest compute ecosystem,” which is to say that the Arm processor architecture is leveraged across many disparate processor designs. It’s probably true that there are more apps for Arm processors than any other ISA by a long shot, but that’s sort of neither here nor there. The point is that, as Arm itself says, there has been an explosion in “device types, neural networks, and inference engines.”
arm compute and kleidi
With that in mind, Arm is introducing Kleidi, which it claims is a highly-optimized compute library that targets all Arm CPU architectures for Arm v8 and Arm v9. To make a familiar comparison for desktop users, we can say that Kleidi is roughly akin to Intel’s OneAPI, except that where Intel aims to target virtually any hardware, Kleidi is understandably focused on Arm’s own machines. Interestingly, despite the AI compute horsepower of its GPUs, Kleidi seems exclusively optimized for CPUs, at least for now.

Kleidi is primarily targeted at AI applications, but not exclusively; there is in fact an AI-specific branch of Kleidi for AI called, predictably, KleidiAI, with “optimized low-level intrinsics for Arm CPUs, designed for highly-optimized generative AI frameworks.” There’s also another specific branch of Kleidi for computer vision called KleidiCV that Arm says is 75% faster than “default OpenCV implementations.”

Introducing The Cortex X925 CPU And Immortalis G925 GPU IP

arm cortex x925 highlights
Of course, the company also has new hardware to debut, and if these numbers are accurate in real-world scenarios, then Arm’s new chips are forced to be reckoned with. Starting with the new Cortex-X925 CPUs, Arm says that one of these on a development platform offers a 36% uplift in single-threaded performance compared to a “2023 Premium Android” device. That’s a crazy year-over-year IPC uplift, and indeed, Arm says that this is the largest such gain in the history of the Cortex-X brand. It’s certainly the largest jump in CPU model number in the history of the brand, as the previous-generation Cortex-X parts were the Cortex-X4 from last year.

Other notable qualities of the Cortex-X925 CPU  include radically improved AI performance thanks to increased instruction decode and vector processing width as well as the ability to equip the CPU with up to 3 megabytes of L2 cache per core. That’s larger than you’ll find on Intel’s biggest CPUs right now, although we suspect the majority of Cortex-X925 implementations will probably go with a smaller L2, as cache is both expensive in terms of silicon area and power-hungry as well.

arm immortalis g925 highlights
Meanwhile, the company’s latest GPU silicon builds on its previous “Immortalis” hardware to make what Arm calls its “most performant and efficient GPU.” Comparing directly against its most powerful extant GPU hardware, the Immortalis-G720, the new G925 part in its biggest 14-core configuration is apparently 37% faster in gaming, 34% faster in AI, and 52% faster at ray-tracing. Arm didn’t elaborate mich on what it changed to effect these gains, though obviously some of the benefit comes from the increased core count compared to the previous generation, in addition to architectural and manufacturing process updates.

Android Platform Benefits: Performance And Efficiency

arm platform gains

Taken as a whole, Arm says that its new Compute Subsystems platform offers improvements like a 30% reduction in GPU power usage and 15% overall power efficiency gains for the Cortex A520 “little” cores found in its TCS23 platform. It also notes “performance efficiency” gains of 35% for its updated Cortex-A725 cores, which serve as “Premium Efficiency” CPU cores that rest between the small A520 cores and the big Cortex-X family CPUs. 

“Performance Efficiency” is an interesting metric; presumably what Arm means is that the revised CPUs achieve a +35% uplift in efficiency primarily on the basis of increased performance rather than reduced power consumption. In any case, these gains likely result primarily from the shift to a 3nm fabrication process, as Arm emphasizes throughout its press materials that it has production-ready implementations for the new hardware on an unspecified 3nm process.

arm android platform benefits

Arm says that the refinements to software and hardware result in huge uplifts to performance for browsing and gaming, while video watching in the YouTube app apparently uses 10% less power on the latest devices. The updates to the Armv9.2 ISA also include new ISA extensions that can offer improved security, too. It’s not clear exactly when any of this will make it to market, but it depends on who licenses the technology, as Arm doesn’t sell its own processors to end users. It’s a safe bet, however, that these latest CPU core and GPU designs will make their wait into next-gen devices in the not too distant future.