I've seen this elsewhere too. For relatively mundane task (I think it was Morton code conversion), a giant lookup table was constructed and it gave a nice 2x or 4x performance improvement.
But no application will ever be doing only string despacing or Morton codes, so the "fast" lookup table algorithm will make everything else slower by evicting good cache lines. And once something else runs and evicts the lookup tables, the next run will be slow again.
But no application will ever be doing only string despacing or Morton codes, so the "fast" lookup table algorithm will make everything else slower by evicting good cache lines. And once something else runs and evicts the lookup tables, the next run will be slow again.