The jump to 3.x was a more or less arbitrary decision made by Linus because the 2.6 version numbers were getting very large. Switching to 3.x should basically be the same as any other kernel upgrade, possibly with some additional issues with software that assumes things about the form of the kernel version number. There shouldn't be any distros holding back from upgrading; in fact Ubuntu 12.04 uses a 3.2 kernel.
So they could easily afford a controlling share, not that they'd have anything interesting to do with it. As far as the Mac goes, Intel is doing more or less exactly what Apple would want them to do, and as far as iOS goes, Intel is already irrelevant.
I agree, though I think it could fairly be taken as one small bit of evidence in favor of "C++ has a lot of gotchas". In this case it looks like the culprit is C++'s C-compatibility-driven decision to sync with stdio by default, and therefore to avoid buffering input. Of course, if they made the opposite decision on defaults, "C++ doesn't sync with stdio by default" would be a different, probably also common, variety of "gotcha".
I've not implemented the C++ std library, but my guess is it's because iostreams need to implement their own buffering anyway, so it would just add complexity and unpredictability to buffer atop an already-buffering library.
I'm not sure this quite makes sense. The buffering can already be disabled, clearly, since that's what's being discussed. The non-buffering implementation could be easily placed atop FILE (I don't know the details, but I can't imagine a FILE-based iostream implementation being at all complex) at which point you have a buffered implementation that also cooperates with pure C stdio. iostream would need buffering for other operations, but could just leave it off permanently for stdio, and the switch already exists.
I don't think that he is arguing the the GIL isn't a limitation, just that the fundamental limitation it imposes can't be removed without also changing the threading model or the garbage collector. It's really impossible run threads in parallel with any sort of performance when they're all constantly generating a huge amount of cache coherency traffic by updating reference counts.
I believe his argument is that it would reduce thrashing between the caches. With the GIL ownership of a cache line containing the reference count for any given object will only have to be transfered at most once per timeslice. If multiple threads were concurrently accessing a python object it would be ping-ponging back and forth between caches much more frequently.
EDIT: Also, "stop whatever they are doing to synchronize the dirty cache lines with RAM," is not a very good way to describe what is going on, often times you don't have to hit RAM at all, the caches just synchronize between each other. It is still pretty bad for performance though.
>I believe his argument is that it would reduce thrashing between the caches. With the GIL ownership of a cache line containing the reference count for any given object will only have to be transfered at most once per timeslice.
Ah. Makes sense.
>just synchronize between each other
Yes, but that's bad because that cache line is 'stuck' for all processors while the synchronization is occurring, if I'm not mistaken...
In general at least the two processors with the conflict will either have to block for a bit or switch to another hardware thread when write conflicts are occurring. There are lots of architecture tricks people pull to try to mitigate the impact but the reality of the matter is frequently mutating shared state (e.g. reference counts) makes it extremely difficult to have good performance with threads running in parallel.