*throw the sucker at a video card, and watch it finish thousands of times faster...

jakozaur · on Sept 28, 2010

The right answer is in the middle. Of course speed up depends on what are you comparing, but if you benchmark GPU against decent four core CPU the speed up is in order of magnitude.

For dense Matrix currently is about eight times (http://forums.nvidia.com/index.php?showtopic=172176) on CUDA 3.1. That is before the CUDA 3.2 which claim 50-300% performance increase.

For sparse matrix the speed up against multi core CPU is about ten times (http://forums.nvidia.com/index.php?showtopic=83825).

All available as a public libraries (http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html).

jedbrown, please provide a source of your estimates. Of course I'm interested in some highly optimized libraries like BLAS. Hand written code would be on both systems several times slower.

jedbrown · on Sept 28, 2010

"On the limits of GPU Acceleration" (short summary from Richard Vuduc): http://vuduc.org/pubs/vuduc2010-hotpar-cpu-v-gpu.pdf

"Understanding the design trade-offs among current multicore systems for numerical computations" (somewhat more technical): http://dx.doi.org/10.1109/IPDPS.2009.5161055

"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU" (from Intel, but using the best published GPU implementations): http://doi.acm.org/10.1145/1815961.1816021

BTW, you may as well cite CUSP (http://code.google.com/p/cusp-library/) for the sparse implementation, it's not part of CUDA despite being developed by Nathan Bell and Michael Garland (NVidia employees).

wingo · on Sept 28, 2010

Really great comment, thank you.