Cool! On a quick glance, it doesn't seem like the group/layer index is provided to the compression model. That might help a bit with fidelity at pretty low additional cost.
Interesting, it certainly wouldn’t take up much additional space, but I wonder if it would have any real impact, since it seems somewhat orthogonal to finding a faithful low-dimensional encoding of the activations.
There is a surge of interest in learning control policies end-to-end. Many of Sergey Levine's recent papers are relevant: http://homes.cs.washington.edu/~svlevine/ (there are also some talks linked there).