HN2new | past | comments | ask | show | jobs | submitlogin

I was able to run 7B on a CPU, inferring several words per second: https://github.com/markasoftware/llama-cpu


Beginner pytorch user here... it looks like it is using only one CPU on my machine. Is it feasible to use more than one? If so, what options/env vars/code change are necessary?


Perhaps try setting `OMP_NUM_THREADS`, for example `OMP_NUM_THREADS=4 torchrun ...`.

But on my machine, it automatically used all 12 available physical cores. Setting OMP_NUM_THREADS=2 for example lets me decrease the number of cores being used, but increasing it to try and use all 24 logical threads has no effect. YMMV.


nice!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: