TLDR
We repurpose two prosthetic hands with touch sensing for research use,
develop a bimanual multifingered hands teleoperation system to collect visuotactile data,
and learn cool policies.
Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. To tackle the first challenge, we develop HATO ("dove" 🕊️ in Japanese), a low-cost hands-arms teleoperation system that leverages off-the-shelf electronics, complemented with a software suite that enables efficient data collection; the comprehensive software suite also supports multimodal data processing, scalable policy learning, and smooth policy deployment. To tackle the latter challenge, we introduce a novel hardware adaptation by repurposing two prosthetic hands equipped with touch sensors for research. Using visuotactile data collected from our system, we learn skills to complete long-horizon, high-precision tasks which are difficult to achieve without multifingered dexterity and touch feedback. Furthermore, we empirically investigate the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data.
Our system demonstrates precise motion control capabilities, enabling the robot to perform tasks like uncapping bottles, pouring wine, and manipulating tea bags with dexterity comparable to humans. You can even control your robot to play your favorite video games on a joystick! Our favorite is Hollow Knight : )
(1) Play Hollow Knight
(2) Pull Out a Tea Bag
(3) Uncap and Pour from a Wine Bottle
Compared to multifingered hands, parallel grippers commonly suffer from failure modes like slipping objects, unstable holds, and unstable grasps. Oftentimes, teleoperators need to align the gripper with the object in a very precise way to minimize grasping failure.
@article{lin2024learning, author={Lin, Toru and Zhang, Yu and Li, Qiyang and Qi, Haozhi and Yi, Brent and Levine, Sergey and Malik, Jitendra}, title={Learning Visuotactile Skills with Two Multifingered Hands}, journal={arXiv:2404.16823}, year={2024} }
We thank Jesse Cornman from PSYONIC for help with setting up the Ability Hands, and Philipp Wu for help with setting up the UR5e robot arms and GELLO. TL is supported by fellowships from the National Science Foundation and UC Berkeley. QL is supported by ONR under N00014-20-1-2383, and NSF IIS-2150826. HQ is supported by the DARPA Machine Common Sense and ONR MURI N00014-21-1-2801. This research was also partly supported by Savio computational cluster provided by the Berkeley Research Compute program.