Sim-to-Real Reinforcement Learning for
Vision-Based Dexterous Manipulation on Humanoids

Toru Lin1,2
Kartik Sachdev2
Linxi "Jim" Fan2
Jitendra Malik1
Yuke Zhu2,3
UC Berkeley1
NVIDIA2
UT Austin3

TLDR
We present a successful approach to learning humanoid dexterous manipulation using sim-to-real reinforcement learning, achieving robust generalization and high performance without the need for human demonstration.

Reinforcement learning has delivered promising results in achieving human- or even superhuman-level capabilities across diverse problem domains, but success in dexterous robot manipulation remains limited. This work investigates the key challenges in applying reinforcement learning to solve a collection of contact-rich manipulation tasks on a humanoid embodiment. We introduce novel techniques to overcome the identified challenges with empirical validation. Our main contributions include an automated real-to-sim tuning module that brings the simulated environment closer to the real world, a generalized reward design scheme that simplifies reward engineering for long-horizon contact-rich manipulation tasks, a divide-and-conquer distillation process that improves the sample efficiency of hard-exploration problems while maintaining sim-to-real performance, and a mixture of sparse and dense object representations to bridge the sim-to-real perception gap. We show promising results on three humanoid dexterous manipulation tasks, with ablation studies on each technique. Our work presents a successful approach to learning humanoid dexterous manipulation using sim-to-real reinforcement learning, achieving robust generalization and high performance without the need for human demonstration.

Overview

We train a humanoid robot with two multi-fingered hands to perform a range of contact-rich dexterous manipulation tasks on various objects. Observations are obtained from a third-view camera, an egocentric camera, and robot proprioception. The deployed reinforcement learning policies can adapt to a variety of unseen real-world objects that have varying physical properties (e.g., shape, size, color, material, mass) and remain robust against force disturbances.

Method

Our contributions include an automated real-to-sim tuning module, a generalized reward design scheme, a divide-and-conquer distillation process, and a mixture of sparse and dense object representations. These techniques collectively enable the training of robust, generalizable, and dexterous manipulation policies that can be successfully transferred to real-world humanoid robots.

Dexterity and Generalization

Our policy is capable of performing dexterous grasping on a diverse range of objects, including ones that are out-of-distribution with respect to the training distribution. The emergent dexterity also enables our policy to solve hard grasping tasks that require precise finger motions, such as grasping small and slippery objects.

We observe the emergence of diverse grasp patterns from the same policy even with the same object. The grasp patterns are adaptive to variations in both object properties and states.

Robustness and Recovery

During policy deployment, we perturb objects at random times by poking, and pulling, pushing them along random directions using a picker tool or hands. Our policy is robust against these random external forces, and can adapt quickly to sustain continuous policy execution. Left video shows grasp policies; right video shows lift and handover policies.

Our policy also exhibits interesting emergent failure recovery behaviors that maintain its robustness even when the force disturbances are so strong that the object is dropped. We observe that our policy can quickly adjust the finger motions and perform regrasping to continue policy execution. Left video shows recovery of a grasp policy; right video shows recovery of a lift policy.

Simulation Results

During training, we sometimes observe RL policies that develop remarkably dynamic and creative motions. The left video demonstrates a "standard" handover policy, while the right video showcases a highly dynamic variant. Although these fascinating behaviors often emerge from exploiting simulator dynamics and do not transfer well to the real world, we find them intriguing and wanted to share these entertaining examples with the community :)

Bibtex


        @article{lin2025sim,
          author={Lin, Toru and Sachdev, Kartik and Fan, Linxi and Malik, Jitendra and Zhu, Yuke},
          title={Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids},
          journal={arXiv:2502.20396},
          year={2025}
        }
        

Acknowledgements

We thank members of NVIDIA GEAR lab for help with hardware infrastructure, in particular Zhenjia Xu, Yizhou Zhao, and Zu Wang. This work was partially conducted during TL's internship at NVIDIA. TL is supported by NVIDIA and the National Science Foundation fellowship.

Website template edited from HATO