Isn't it the case that from stack traces it is rather impossible to read that function foo() is burning CPU cycles because it is memory-bound? And the reason could be rather somewhere else and not in that particular function - e.g. multiple other threads creating contention on the memory bus?
If so, doesn't this make the profile somewhat an invalid candidate for PGO?
It depends on the event that was sampled to generate the profiles. For example, if you sample instructions by collecting a stack trace every N instructions, you won't actually see foo() burning the CPU. However, if you look at CPU cycles, foo() will be very noticeable. Internally, we use sPGO profiles from sampling CPU cycles, not instructions.
Right, perhaps I was a little bit too vague but what I was trying to say is that by merely sampling the CPU cycles we cannot infer that the foo() was burning CPU because it was memory-bound and which in itself is not an artifact of foo() implementation but rather application-wide threads that happen to saturate the memory bus more quickly.