There is a neat potential speedup here for the case where the bandwidth to your ...

humanizersequel · on March 3, 2023

This is an interesting concept, could you share a paper or some writeup about this?

ebalit · on March 3, 2023

It looks like a description of Speculative Sampling. There's a recent paper from DeepMind about this in the context of LLM [0], although it's not a completely new idea of course.

The potential for speedup according to their paper is closer to 2x than 10x however.

0: https://arxiv.org/abs/2302.01318