> Agree agnostic would help, but the machine also has to handle SW asking for mask/tail unchanged, right?
The agnosticness flags can be forwarded at decode-time (at the cost of the non-immediate-vtype vsetvl being very slow), so for most purposes it could be as fast as if it were a bit inside the vector instruction itself. Doesn't help vl=0 though.
The agnosticness flags can be forwarded at decode-time (at the cost of the non-immediate-vtype vsetvl being very slow), so for most purposes it could be as fast as if it were a bit inside the vector instruction itself. Doesn't help vl=0 though.