I think the scientific point here is that visual processing is not a one-shot process. Tasked with object detection, some scenes demand more careful processing and more computation.
Almost all neural network architectures process a given input size in the same amount of time, and some applications and datasets would benefit from an "anytime" approach, where the output is gradually refined given more time.
I understand the point you are making, but it's kind of irrelevant. The task is to produce an answer for the image at the given resolution. It is an accident and coincidence that the neural network produces an answer that is arguably correct for a blurrier version of the image.
Almost all neural network architectures process a given input size in the same amount of time, and some applications and datasets would benefit from an "anytime" approach, where the output is gradually refined given more time.
I understand the point you are making, but it's kind of irrelevant. The task is to produce an answer for the image at the given resolution. It is an accident and coincidence that the neural network produces an answer that is arguably correct for a blurrier version of the image.