Pedantic quip: I have a hard time believing you guys were measuring half nanosec...

luckydude · on May 31, 2012

We didn't believe it either until we put a logic analyzer on the bus and found that the numbers were spot with respect to the number of cycles. I don't remember how far off they were but it wasn't much, all the hardware dudes were amazed that software could get that close.

tl;dr: the numbers were accurate to the # of cycles, might have been as much as 1/2 of 1 cycle off.

Edit: I should add this was almost 20 years ago, I dunno how well it works today. Sec, lemme go test on a local machine.

OK, I ran on a Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (I think that's a Sandy Bridge) that is overclocked to 4289 MHz according to mhz, and it looks to me like that machine takes 4 cycles to do a L1 load. That sound right? lmbench says 4.05 cycles.

I poked a little more and I get

L1 4 cycles, ~48K L2 12 cycles, ~256K L3 16 cycles, ~6M

Off to google and see how far off I am. Whoops, work is calling, will check back later.

WayneS · on June 1, 2012

You realize he is measuring averages right? Setup 1 million L1 cache hits, divide by total time...