Amazon Orders More than 10,000 Nvidia Tesla cards

Karhan · on Oct 4, 2012

I remember reading a blog post of the peculiarities of GPU programming and the post noting that for most modern graphics cards (at the time) if you can keep your computable data in chunks no bigger than 64kb a piece you can expect to see enormous performance gains even on top of what you'll see by using openCL/CUDA because of a physical memory limit on the actual GPU itself.

I also remember thinking that a 64kb row size for DynamoDB was very odd.

I wonder if these things are at all related.

pavanky · on Oct 5, 2012

You are taking about shared memory in CUDA, local memory in OpenCL. When you are reading from the same location over and over again (most notable cases are linear algebra functions, filtering in signal processing), reading from DRAM is going to be costly. This is solved on the CPUs by having multiple layers of caches.

Early generation of NVIDIA gpus did not an automatic Caching mechanism or could not for CUDA, I forget) that could help solve this issue. But they did have memory available locally on each compute unit where you could manually read / write data into. This helped reduce the overall read/write overhead.

Even when the newer generations have the caches, it is beneficial to use this shared / local memory. Even when the shared / local memory limits are hit, there are alternatives like Textures in CUDA, Images in OpenCL that are slightly slower, but significantly better than reading from DRAM.

janzer · on Oct 5, 2012

Only relation being that 64k is 2^16 I imagine.

mercuryrising · on Oct 4, 2012

Amazon's cloud might be one of the coolest things I've seen in a while, hop on, get some of the best computing performance possible, get off and save some money. If you have a random data analysis problem that would take your computer three weeks, why not just pay $10 and get it done in two hours (plus a few hours of debugging)?

If the article is correct, Amazon paid 15 million for those cards which will be out of style in about two years (not that they have to get rid of them, but something faster, easier to maintain (if Nvidia starts opening up to Linux), with more memory and less power usage will come out. They'll have to fork over a large sum of money again to keep their top "on demand computing" title.

Amazon's cluster GPU right now has two Nvidia Tesla Fermi's in it. I'm going to assume Amazon will split their new cards into twos and fours, at about half of each. That's ~1750 new computers that are going to load up. Looking at the current rates of the cluster, it's $2.100 for an hour of the normal, I'll say it will be $4.200 for an hour on the jumbo with 4 GPUs.

They paid $15 million for just the cards. They need to get 2380952 hours of usage out of the machines to break even on the cards. They need to log 1360 hours per machine to break even, or have someone run all the machines at full bore for 56 days. As the cards are the most expensive component (assumption), and the total price of the computer will be about the price of one of the cards, we'll add a little bit of over head for all the other things they need to do to make it work - 120 days of full time use to break even on an investment of about $25 million (they need to buy lots of other things to put all the GPUs in, and worry about all that heat, and have a place to put it all, and have people install the new computers, etc...). I wonder what the actual usage of those clusters are, and if they've had anyone sign a deal saying we'll use the cluster for an entire month. That's a beautiful maneuver though, say CERN didn't want to do all the data analysis from the LHC in house because by the time they got to this part of the experiment, their technology they purchased previously would be way out of date. Just let Amazon do it. They will always have the latest technology, and you'll have an inexpensive way of leveraging that power.

Assuming they can make it all work (and I'm sure a lot of their decisions now are strategic decisions aimed at future investments) this is a great time to be a computer user, log on and get the best for a couple hours for a couple dollars. Instead of shelling out $1500 on a new computer personally, I could log a ton of EC2 hours getting significantly faster, more powerful machines, that never get 'stale', and their lives are much happier (my computer probably doesn't do anything "intensive" 70% of its life, whereas the EC2s are probably pushed a bit harder than that).

kalleboo · on Oct 5, 2012

Amazon are also well-known for running things at a loss for years while an idea gets entrenched. It could be that they're prepared to eat a loss for a couple of generations just to popularize the idea of cloud-based HPC.

They might not run their hardware at a profitable capacity right now, but in 3-4 years when some labs are looking at replacing/updating their current systems, they will look at Amazon as an solid alternative and then things will turn up. And until then they can't half-ass it with too low capacity because when labs do some trial projects, if they come back with a "sorry we're out of instances" error, they'll decide they can't trust Amazon.

dave_sullivan · on Oct 5, 2012

I've looked hard at using amazons gpu clusters, and the math just doesn't work out.

If you're working on applications that will need to be using the gpu regularly, you can build a system with 4 gtx580s for about $3,000, and one of those systems will outperform 2, maybe 3, aws gpu instances, which will run you about 1000 per month each. The ownership number does not include data center/power/etc., but I still think buying is better value if you'll be using it a lot.

Now, if you're running gpu jobs sporadically, aws may make sense, but you should really look carefully at this, it's not the same value relationship as hosting web servers on aws (which I'm a general proponent of).

Although, to be fair, that may change if they really do pass on some of their savings from this deal to the user.

pavanky · on Oct 5, 2012

6Gigs on a single GPU is very tempting. And the single precision performance (Theoretically 2.2 TFLOPS) per GPU on the K10 is more than anything else you can get on the market. To buy one of those would cost you $3500-$4000. To buy two + motherboard + xeon would probably cost you close to $10,00. If you want to scale it and set up a cluster, the cost per machine scales up and the amount of man hours spent would become a factor.

Amazon offers a heavy usage deal which comes out to ~ $11,000 per instance if used 24x7x365.

You could argue that the cluster / machine you set up would be useful for more than a year. This is true to an extent, but at the current rate of development GPUs become obsolete rather quickly and suddenly having a cluster on the cloud sounds more appealing than going through the process of updating your machines every 18-20 months.

samg_ · on Oct 5, 2012

Yeah, I did a few trial runs on the cluster gpu instances maybe 4 months ago. I found that, while the gpus themselves were really quite fast, moving data in and out of gpu was not. Maybe amazon will focus on increasing bandwidth to the gpu for the new boxes.

pavanky · on Oct 5, 2012

To be fair, there should not be a lot of data transfers to and from the GPU. Moving larger chunks of data (instead of many smaller ones) when you are running out of memory, using asynchronous data / compute streams would increase the performance.

tisme · on Oct 5, 2012

Why would you buy the previous generation GPUs when you can get the current one?

pavanky · on Oct 5, 2012

Because the current series (GTX 680) is severely stunted for CUDA compared to the GTX 580. The single precision performance of 680 barely beats the 580. The double precision performance on the GTX series sucks in general, but NVIDIA actually made it twice as worse going from 580 to 680. (Benchmarks linked at bottom).

The reasoning may have been to focus the GTX series more on gaming. Or it could be more sinister to push more people towards their costlier Tesla Line. Considering that they came out with the K10 which has terrible double precision performance, but incredible single precision performance, I think they are heading towards multiple Tesla lines and want to push the GTX series away from the serious GPGPU computing.

Disclaimer: The following is my company's blog. The post is authored by me. http://blog.accelereyes.com/blog/2012/04/26/benchmarking-kep...

tisme · on Oct 5, 2012

Thank you very much for that reply.

I have a 590 and I'm pretty happy with it, been eying the 690 but I wasn't able to find any real world non gaming benchmarks.

samirunni · on Oct 5, 2012

> If you have a random data analysis problem that would take your computer three weeks, why not just pay $10 and get it done in two hours (plus a few hours of debugging)?

I'm facing this very problem myself right now with some GPU-bound calculations. The issue for me is that the software I'm using isn't free, so I'm stuck running it on the one machine that I do have a license for it to run on. It's a very frustrating situation to be in, as the hardware isn't so great.

Nick_C · on Oct 5, 2012

I wonder if this might be a game changer for software like Matlab, SAS, et al. A licence would buy you the right to one install locally and one on a cloud server. (Although last time I had anything to do with Matlab licensing and user negotiation, they move at a pace common to that of glaciers. With the same temperature.)

ChuckMcM · on Oct 5, 2012

They won't necessarily go out of style, as a compute engine its not like you won't need to compute. The challenge with older hardware is when it costs more to run it than you can return in cash, but if you notice Amazon hasn't been discounting EC2 time as their cost of compute has gone down, so after they have depreciated all the hardware this stuff will still be useful for turning electrons into cold hard cash :-).