Temperature = 0 will give deterministic results, but might not be as “creative”. Also it’s not enough to guarantee determinism , hardware executing the LLM can lead to different results as well
In terms of being part of a test suite, I think determinism > creativity in the response. But I would agree there's probably rough edges there, it's possible that some prompts never perform well with temperature set to 0.