Selfless plug here... Some collaborators and I just released a first version of a benchmark we think highlights a critical gap in recent models in understanding causality in the real-world, beyond a physics focus.
Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).
For a while I shared an old apartment with 3 others. Old building, 10th floor, cloth-wrapped wiring (no modern plastic/rubber/etc.), windows that didn't close, a condemned fenced-off balcony, and occasional rat visitors that could reach the 10th floor.
My team and I are working on embodied AI. More specifically, focusing on humanoid legged robots for long horizon tasks combining navigation and manipulation/interaction.
Not to mention that clarifying this can lead to increased motivation to learn and discover stuff, as kids won’t think that if they are not geniuses they shouldn’t even try to create something new.
Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).
SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios (https://huggingface.co/papers/2511.17649)
Feedback, suggestions, and collaborators are very welcome!