If I'm understanding correctly, this is collecting LBR data through hardware sup...

dang · on Feb 1, 2025

(These are older comments that we merged from https://hackernews.hn/item?id=42888185, in case anyone was confused by the timestamps)

BigRedEye · on Feb 1, 2025

Yes. Although we are studying CSSPO, which uses a mixed (LBR + software-sampled stacks) approach.

brancz · on Feb 1, 2025

I'm familiar with the paper, but it doesn't improve the situation in terms of LBR availability on cloud providers, does it?

BigRedEye · on Feb 1, 2025

Yes, existing limitations apply. Without hardware LBR support, we cannot provide sPGO profiles. However, the basic profiling should work fine.

menaerus · on Feb 1, 2025

Blog is packed with information, thanks!

Isn't it the case that from stack traces it is rather impossible to read that function foo() is burning CPU cycles because it is memory-bound? And the reason could be rather somewhere else and not in that particular function - e.g. multiple other threads creating contention on the memory bus?

If so, doesn't this make the profile somewhat an invalid candidate for PGO?

BigRedEye · on Feb 1, 2025

It depends on the event that was sampled to generate the profiles. For example, if you sample instructions by collecting a stack trace every N instructions, you won't actually see foo() burning the CPU. However, if you look at CPU cycles, foo() will be very noticeable. Internally, we use sPGO profiles from sampling CPU cycles, not instructions.

menaerus · on Feb 1, 2025

Right, perhaps I was a little bit too vague but what I was trying to say is that by merely sampling the CPU cycles we cannot infer that the foo() was burning CPU because it was memory-bound and which in itself is not an artifact of foo() implementation but rather application-wide threads that happen to saturate the memory bus more quickly.

Or is my doubt incorrect?