Sulik's comments

Sulik · on Sept 16, 2020

CUDA is basically just C/C++ with parallel-programming concepts. The "alternatives" like OpenCL are still tied to their graphics API origins.

Sulik · on Sept 16, 2020

One "CUDA core" is indeed one GPU thread. The lane of a GPU SIMD is nothing like CPU SIMD, and can independently branch (even if that branching can be much more expensive than on a CPU).

TomVDB · on Sept 17, 2020

> One "CUDA core" is indeed one GPU thread.

This is not true, just like a shader core with AMD was not a GPU thread.

For example, the 2900 XT had 320 shader cores, but since it used VILW-5 ISA, that corresponds to 64 GPU threads.

Similarly, an RTX 3080 has 8704 CUDA cores, but there are 2 FP32 ALUs per thread, resulting in 4252 threads, and 68 SMs since, just like Turing, there are 64 threads per SM.

Sulik · on Sept 16, 2020

To me, the philosophy of C (and even C++ before things became nuts) is that you should be able to reasonably guess the assembly code resulting from the C code, the idea being that you can write assembly code with much less typing. These days, things seem to be moving in a more dogmatic direction with the underlying assumption that the vast majority of programmers are bad programmers.

mhh__ · on Sept 17, 2020

You're fighting a losing battle though. You can guess what shape the assembly looks like but ultimately unless you have measured and identified somewhere where you can make progress your brain simply cannot keep in mind all the little nuances of the optimizations a compiler does.

It's worth adding that the compilers themselves aren't perfect - you can use LLVM's code to try and predict how your loop will perform (LLVM-MCA) and last time I checked it wasn't amazingly accurate.

Ar-Curunir · on Sept 17, 2020

That’s not true anyway today, because the assembly generated depends on the architecture, and there’s sufficient diversity there.

The only thing that you can predict about compiled C code is the instructions in the C abstract machine, because that’s what C targets.

Sulik · on Sept 17, 2020

You can predict a pseudo-assembly output for a mental model of the architecture(s) you're targeting. It doesn't have to be an exact match, register allocation and all.

Sulik · on Sept 16, 2020