ShengShu Technology has unveiled Motubrain, a unified AI model designed to act as a general-purpose brain for robots, combining perception, reasoning, prediction, and action into a single system.
The company says the model replaces fragmented, task-specific architectures typically used in robotics with a single framework capable of handling multiple tasks and environments. The approach aims to reduce dependence on separate modules for sensing, planning, and execution.
Motubrain has already shown strong benchmark performance, achieving a 63.77 score on WorldArena and averaging 96.0 across 50 tasks on RoboTwin 2.0. It is also reported to be the only model to exceed 95.0 in randomized environments.
The system builds on ShengShu’s earlier work in generative video through its Vidu platform, using large-scale video data to train robots to understand and interact with real-world environments.
One brain, many tasks
Motubrain is designed as a unified multimodal model that learns from video, language, and action simultaneously. This allows robots to process their surroundings, predict outcomes, and act in real time without switching between separate systems.
“A true world model must be able to build a unified representation of the real world and predict how it evolves,” said Jun Zhu, Founder of ShengShu Technology.
The model uses a three-stream Mixture-of-Transformers architecture to integrate inputs from different modalities. This setup enables robots to understand instructions, anticipate environmental changes, and generate appropriate actions in one continuous loop.
Unlike conventional systems that rely heavily on labeled datasets, Motubrain is trained using a broader mix of unlabelled video, simulation data, and multi-robot task recordings. A latent action framework extracts motion patterns directly from these inputs, reducing the need for manual annotation.
This training approach allows the model to scale more efficiently. In internal evaluations, Motubrain maintained higher success rates than competing systems as both task complexity and training data increased.
From data to action
Motubrain can execute multi-step tasks involving up to 10 atomic actions, significantly more than the typical 2–3 handled by many current robotic systems. This enables robots to complete more complex, real-world activities in a single sequence.
“We believe general world models should not be built as stitched-together modules, but as a unified architecture that brings together perception, reasoning, prediction, generation, and action in a single system.”
In real-world tests, robots trained with Motubrain demonstrated the ability to adapt during execution. For example, when a task failed mid-action, such as picking up an object unsuccessfully, the system could recognize the failure and retry without prior training on that specific scenario.
The company says the model is already being used by robotics firms in active training programs across industrial, commercial, and home environments. Partnerships with companies including Astribot, SimpleAI, and Anyverse Dynamics aim to further expand deployment.