Pedantic quip: I have a hard time believing you guys were measuring half nanosecond cache latencies on a machine with a 100MHz clock. :)
And actually the cache numbers seem optimistic, if anything. My memory is that a L1 cache hit on SNB is 5 cycles, which is 2-3x as long as that table shows.
We didn't believe it either until we put a logic analyzer on the bus and found that the numbers were spot with respect to the number of cycles. I don't remember how far off they were but it wasn't much, all the hardware dudes were amazed that software could get that close.
tl;dr: the numbers were accurate to the # of cycles, might have been as much as 1/2 of 1 cycle off.
Edit: I should add this was almost 20 years ago, I dunno how well it works today. Sec, lemme go test on a local machine.
OK, I ran on a Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (I think that's a Sandy Bridge) that is overclocked to 4289 MHz according to mhz, and it looks to me like that machine takes 4 cycles to do a L1 load. That sound right? lmbench says 4.05 cycles.
And actually the cache numbers seem optimistic, if anything. My memory is that a L1 cache hit on SNB is 5 cycles, which is 2-3x as long as that table shows.