That's a great idea - especially because photogrammetry with high quality cameras can have way higher detail than most of the common 3D sensors (realsenses, luxonis, etc). The big problems there are the computation cost and/or set up complexity of photogrammetry. You either need to do a lot of computation (a couple minutes on my RTX 4090 last time I did it for a medium sized object) to estimate keypoints and disparities or you need a really well calibrated ring of cameras, some way to feed parts through it at line rate, but could get away with less compute.
A laser scanner would probably make the mesh comparison approach easier, but it's still incredibly hard to get a really accurate and high resolution depth map in a short time span - especially if the parts are actively moving.
Steph here - each image takes about ~250ms on a small single board compute like an Nvidia Orin Nano. On something larger like an RTX 4080 GPU it's less than 100ms. Because we're running big models we can't really just spin out more threads ourselves, we throw them over to the GPU (or deep learning accelerator - depending on the platform) and the driver's internal scheduler decides how to get it done.
In a robotic packaging scenario most of the time is spent by the robot picking up the objects and moving them, so for a 30 second cycle we usually get less than a second to take multiple pictures and make a decision about the part. For a smaller number of images - like 4 - it's pretty easy to handle with cheap hardware like an Orin Nano or Orin NX. If we've got more images (like 8) and a tight time budget (like less than 2 seconds) we'd usually just bump up the hardware, like going to a higher tier of Nvidia's line of Orins or using compute with an RTX 4080 GPU or equivalent in it.
Steph here (the guy from the video) - yep we can handle custom tooling pretty easily in general. Usually what we're simulating is robot arms (think scaras with vacuum grippers), acrylic might be a bit fiddly and take some tuning but I bet we can handle it just fine.
Nothing special required from the camera - but it's nice if we know the camera parameters before hand (sensor size, focal length) so we can make sure we generate images matching what that camera spits out.
Right now all the part-based communication is just "email us/jump on a quick call" - in the future I want to make it a self service UI where customers can mark things out.
The primary difference is that Argo AI was a separate company working on a fully driverless (L4 autonomy) system for both Ford and VW. Whereas Latitude AI is a wholly owned subsidiary of Ford working on hands/eyes off driver assistance (L3) instead. So mainly the same people but a different focus.
A laser scanner would probably make the mesh comparison approach easier, but it's still incredibly hard to get a really accurate and high resolution depth map in a short time span - especially if the parts are actively moving.