All great points but in practice, tools like Bazel and sccache are incredibly co...

All great points but in practice, tools like Bazel and sccache are incredibly conservative about hashes matching, to include file path on disk and even env var state.

One goal of these tools is to guarantee that such misconfiguration results in a cache key mismatch, rather than a hit and a bug.

There are tons of challenges designing a remote build cache product, like anything, but that one has turned out to be a reliable truth.

Some other interesting insights:

- transmitting large objects is often not profitable, so we found that setting reasonable caps on what’s shared with the cache can be really effective for keeping transmissions small and hits fast

- deferring uploads is important because you can’t penalize individual devs for contributing to the cache, and not everybody has a fast upload link. making this part smooth is important so that everyone can benefit from every compile.

- build caching is ancient, Make does its own simple form of build caching, but the protocols for it vary in robustness greatly, from WebDAV in ccache to Bazel’s gRPC interface

- most GitHub Actions builds occur in a small physical area, so accelerating build artifacts is an easier problem than, say, full blown CDN serving

The assumptions that definitely help:

- it’s a cache, not a database; things can be missing, it doesn’t need strong consistency

- replication lag is okay because a build cache entry is typically not requested multiple times in a short window of time; the client that created it has it locally

- it’s much better to give a fast miss than a slow hit, since the compiler is quite fast

- it’s much better to give a fast miss than an error. You can NEVER break a build; at worst it should just not be accelerated.

It’s an interesting problem to work on for sure.