We actually distribute the fuzzing workload across physical machines, for precisely that reason. Each instance of AFL gets its own kernel & physical core, and we use a staged synchronization algorithm to make sure all of the machines' corpuses stay up to date.
All of this was done to try and keep the scaling as linear as possible, so that when you double your CPU count you're doubling your execs/second as well.
This sounds like a good solution. I was trying to solve this problem on my own as well and ended up making a minimal kernel+afl image that I then boot multiple times using deduplication features in order to save RAM (I don't have 256 gig ram like you do in a proper server). Each instance ends up eating quite a lot of RAM even with a limited root filesystem so that's why I wanted to keep it low. I'm on a 2990wx rig which was kind of a disappointment from a fuzzing perspective because of the limited memory bandwidth, but that's for another discussion.
Do think it would be worth trying out that snapshot approach that I linked in the parent comment or it might not be worth it? I was thinking of rebasing their patch onto the latest master and getting it to work again - sadly the authors seem to have abandoned the project, at least there hasn't been any public changes since a year back.
All of this was done to try and keep the scaling as linear as possible, so that when you double your CPU count you're doubling your execs/second as well.