Sim-to-Real RL for Vision-Based Dexterous Manipulation on Humanoids

TLDR
We present a successful approach to learning humanoid dexterous manipulation using sim-to-real reinforcement learning, achieving robust generalization and high performance without the need for human demonstration.

Learning generalizable robot manipulation policies, especially for complex multi-fingered humanoids, remains a significant challenge. Existing approaches primarily rely on extensive data collection and imitation learning, which are expensive, labor-intensive, and difficult to scale. Sim-to-real reinforcement learning (RL) offers a promising alternative, but has mostly succeeded in simpler state-based or single-hand setups. How to effectively extend this to vision-based, contact-rich bimanual manipulation tasks remains an open question. In this paper, we introduce a practical sim-to-real RL recipe that trains a humanoid robot to perform three challenging dexterous manipulation tasks: grasp-and-reach, box lift and bimanual handover. Our method features an automated real-to-sim tuning module, a generalized reward formulation based on contact and object goals, a divide-and-conquer policy distillation framework, and a hybrid object representation strategy with modality-specific augmentation. We demonstrate high success rates on unseen objects and robust, adaptive policy behaviors -- highlighting that vision-based dexterous manipulation via sim-to-real RL is not only viable, but also scalable and broadly applicable to real-world humanoid manipulation tasks.

Overview

We train a humanoid robot with two multi-fingered hands to perform a range of contact-rich dexterous manipulation tasks on various objects. Observations are obtained from a third-view camera, an egocentric camera, and robot proprioception. The deployed reinforcement learning policies can adapt to a variety of unseen real-world objects that have varying physical properties (e.g., shape, size, color, material, mass) and remain robust against force disturbances.

Method

Our contributions include an automated real-to-sim tuning module, a generalized reward design scheme, a divide-and-conquer distillation process, and a mixture of sparse and dense object representations. These techniques collectively enable the training of robust, generalizable, and dexterous manipulation policies that can be successfully transferred to real-world humanoid robots.

Dexterity and Generalization

Our policy is capable of performing dexterous grasping on a diverse range of objects, including ones that are out-of-distribution with respect to the training distribution. The emergent dexterity also enables our policy to solve hard grasping tasks that require precise finger motions, such as grasping small and slippery objects.

We observe the emergence of diverse grasp patterns from the same policy even with the same object. The grasp patterns are adaptive to variations in both object properties and states.

Robustness and Recovery

During policy deployment, we perturb objects at random times by poking, and pulling, pushing them along random directions using a picker tool or hands. Our policy is robust against these random external forces, and can adapt quickly to sustain continuous policy execution. Left video shows grasp policies; right video shows lift and handover policies.

Our policy also exhibits interesting emergent failure recovery behaviors that maintain its robustness even when the force disturbances are so strong that the object is dropped. We observe that our policy can quickly adjust the finger motions and perform regrasping to continue policy execution. Left video shows recovery of a grasp policy; right video shows recovery of a lift policy.

Simulation Results

During training, we sometimes observe RL policies that develop remarkably dynamic and creative motions. The left video demonstrates a "standard" handover policy, while the right video showcases a highly dynamic variant. Although these fascinating behaviors often emerge from exploiting simulator dynamics and do not transfer well to the real world, we find them intriguing and wanted to share these entertaining examples with the community :)

Sim-to-Real Reinforcement Learning for
Vision-Based Dexterous Manipulation on Humanoids

Overview

Method

Dexterity and Generalization

Robustness and Recovery

Simulation Results

Bibtex

Acknowledgements

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

Overview

Method

Dexterity and Generalization

Robustness and Recovery

Simulation Results

Bibtex

Acknowledgements

Sim-to-Real Reinforcement Learning for
Vision-Based Dexterous Manipulation on Humanoids