Beginner pytorch user here... it looks like it is using only one CPU on my machine. Is it feasible to use more than one? If so, what options/env vars/code change are necessary?
Perhaps try setting `OMP_NUM_THREADS`, for example `OMP_NUM_THREADS=4 torchrun ...`.
But on my machine, it automatically used all 12 available physical cores. Setting OMP_NUM_THREADS=2 for example lets me decrease the number of cores being used, but increasing it to try and use all 24 logical threads has no effect. YMMV.