HN2new | past | comments | ask | show | jobs | submitlogin

Linux is a good start but I’m really interested in when someone manages to accept the npu with an open stack. ie will a smallish model say 9B or so run at reasonable pace


The NPU is a Hexagon DSP with HVX, Hexagon Vector Extensions, and HMX, Hexagon Matrix Extensions. The core ISA and vector ISAs are pretty well documented and supported by upstream llvm, but AFAIK HMX is not publicly documented. Core ISA + HVX by itself can probably get you to 1/3 to 1/2 of so of the theoretical peak TOPS for Hexagon. It's been a bit since I've run code on device, but all the support code is in their SDK, and it's easy as pie to get it running on the simulator.

QCOM have said that up to ~13B LLMs will run at reasonable, and I think that's a pretty good peak for ~40TOPs and ~150GB/s bandwidth.


Yeah I took a quick look at their SDK and all the .h and .hpp files say "Confidential and Proprietary". They seemed to provide some kind of tool that converts your model into their own format. Perhaps this could be run with a binary driver without infecting your own stuff?

Not that different than certain GPUs I suppose, but I'm going to forget about buying one of those laptops for now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: