It will get there eventually. A lot of it is hardware / deployment constrained.
Search by image and object detection and computer vision in general is cool and potentially useful, but right now, it's cumbersome as fuck to pull out your phone, find the Lens application, take a picture etc. Needs to be baked into a wearable / neuralink type setup.
But self driving applications of CV work because the cameras are always deployed and running. But the hardware is expensive.
models I have used seem to have their usefulness greatly outweighed by performance demands.
scaling and economics are another question entirely.
Perhaps we were spoiled with democratized web tech and it's wishful thinking to want everything to be that.