15.04.2026
17:00
1103
News

Google’s latest AI lets robots understand, plan, and act in real environments

Google has introduced a new AI model designed to help robots better understand and interact with the physical world, addressing one of the biggest challenges in robotics: reasoning beyond instructions, El.kz cites Interesting Engineering.

The model, Gemini Robotics-ER 1.6, focuses on “embodied reasoning,” enabling robots to interpret visual inputs, plan tasks, and determine when a task is complete.

This marks a shift from command-following machines to systems capable of making context-aware decisions.

The update builds on earlier versions by improving spatial reasoning and multi-view understanding, allowing robots to process information from multiple camera feeds and dynamic environments more effectively.

It also introduces new capabilities such as instrument reading, enabling robots to interpret gauges and indicators commonly found in industrial settings.

Bridging digital physical gap

A key improvement lies in how the model handles spatial reasoning tasks. Gemini Robotics-ER 1.6 can identify objects, count them, and determine relationships between them with greater accuracy. It can also point to objects as part of its reasoning process, helping it break down complex tasks into smaller steps.

This capability is critical in real-world environments where robots must interact with objects, navigate cluttered spaces, and make decisions based on incomplete or changing information.

The model also improves success detection, allowing robots to assess whether a task has been completed correctly. This is particularly important in automation workflows, where systems must decide whether to retry an action or move forward.

Multi-view reasoning is another area of advancement. Robots often rely on multiple camera inputs, such as overhead and wrist-mounted views. The model can combine these perspectives to form a more complete understanding of the environment, even in cases of occlusion or poor visibility.