I'm interested in hearing about how you guys upgrade topologies in production (assuming you do). It's designed to be able to run forever, but obviously once in awhile you find a bug, want to track a new stat, etc. I guess if you're pulling data off of a queue, you might be able to get away with letting things queue up for a few seconds as everything restarts with the new code and then catching up. Is that how you handle it, or do you do something more clever?
Currently you let things queue up while you redeploy, but I'm working on a new feature that lets you "swap" two topologies. The new one is deployed in an inactive state, and then the two topologies are swapped. This lets you minimize the downtime to almost nothing.