Branching used to be done by turning the memory of the cores that failed an if test to read-only and just letting all of them continue the computation.
It's gotten better now, but branching is still extremely unwieldy to do. No branch prediction either.
I suppose the question is if you would need to have multiple cores running simultaneously on the same processing element, or if the fact you have some many processing elements means you can just be inefficent and give each core the role of emulating a processor. I haven't seen anything about the core speed of the virtual cpu. however if its 5 or 10mhz, you don't really need high performance or efficency, your just need a way of craming more jobs into your servers and leaving the CPUs to run other game code.
I was trying to be somewhat polite, but... GPUs aren't magic speed juice. You know those big speed gains that get GPU advocates so pumped up? CPUs have the exact same massive speed advantages over GPUs too! That is, when have a task that the CPU is designed for and the GPU doesn't, CPUs kick GPU's ass.
There's no point in trying to jump through hoops to convince the GPU to be something it isn't. It isn't going to be faster than a CPU, or rather, a lot of CPUs.
Being 5 or 10MHz is irrelevant. Being able to simulate them faster means you need fewer servers to do it. (You can tell who actually works on clouds and who doesn't by the attitude towards performance; people who don't actually work on clouds think performance matters less in the cloud....)
I am fully aware that your average GPU isn't optimal for this task, however I was imagining that there would still be value in shifting the world load off the primary CPUs.
My line of thinking is around being able to use a single GPU stream processor to emulate this CPU in the required performance (ie 10mhz). If you could essentially do that you could have hundreds of these processors emulated for the cost of managing the IO to them.
I am not expecting it to be "Magic Speed Juice", I am actually expecting to be getting 1-5% performance from what the GPU are capable of. However I would see this as a nett advantage if it took the workload off the CPU. Something like Knights Corner could easily do this (its basically a pentium 1 core).
The point I am making is that Notch's CPU is basically a home computer CPU from the 80s. They don't require that much functionality to emulate (as if a dozen emulators in a few hours wasn't a good enough indication) and since OpenCL is turing complete you can emulate anything (see running arm linux on an 8 bit processor), the question is if its efficient enough to be viable?
Can a 1ghz stream processor emulate a 10mhz single issue simple risc core? I have no idea, but I suspect its not the part we have seen so far that will be the determining factor, I believe it will instead be the IO devices that determine the requirements.
It's gotten better now, but branching is still extremely unwieldy to do. No branch prediction either.