I don't buy the "any constraints cause lower performance via being out of distribution" idea. Sure if you ask the model to output 'reasoning' in JSON steps, that is a completely different 'channel' to its trained 'reasoning' output. For real tasks though, I think it's more about picking the _right_ context free grammar to enforce format correctness. You can enforce an in-distribution format and get the best of both worlds. I don't think the industry should settle so hard on JSON-for-everything.