Linux is a good start but I’m really interested in when someone manages to accep...

verditelabs · on July 22, 2024

The NPU is a Hexagon DSP with HVX, Hexagon Vector Extensions, and HMX, Hexagon Matrix Extensions. The core ISA and vector ISAs are pretty well documented and supported by upstream llvm, but AFAIK HMX is not publicly documented. Core ISA + HVX by itself can probably get you to 1/3 to 1/2 of so of the theoretical peak TOPS for Hexagon. It's been a bit since I've run code on device, but all the support code is in their SDK, and it's easy as pie to get it running on the simulator.

QCOM have said that up to ~13B LLMs will run at reasonable, and I think that's a pretty good peak for ~40TOPs and ~150GB/s bandwidth.

marshray · on July 22, 2024

Yeah I took a quick look at their SDK and all the .h and .hpp files say "Confidential and Proprietary". They seemed to provide some kind of tool that converts your model into their own format. Perhaps this could be run with a binary driver without infecting your own stuff?

Not that different than certain GPUs I suppose, but I'm going to forget about buying one of those laptops for now.