NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation

Sudharshan Suresh; Haozhi Qi; Tingfan Wu; Taosha Fan; Luis Pineda; Mike Lambeta; Jitendra Malik; Mrinal Kalakrishnan; Roberto Calandra; Michael Kaess; Joseph Ortiz; Mustafa Mukadam

doi:10.1126/scirobotics.adl0628

NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Sudharshan Suresh - , Carnegie Mellon University, Meta Platforms, Inc. (Author)
Haozhi Qi - , Meta Platforms, Inc., University of California at Berkeley (Author)
Tingfan Wu - , Meta Platforms, Inc. (Author)
Taosha Fan - , Meta Platforms, Inc. (Author)
Luis Pineda - , Meta Platforms, Inc. (Author)
Mike Lambeta - , Meta Platforms, Inc. (Author)
Jitendra Malik - , Meta Platforms, Inc., University of California at Berkeley (Author)
Mrinal Kalakrishnan - , Meta Platforms, Inc. (Author)
Roberto Calandra - , Clusters of Excellence CeTI: Centre for Tactile Internet, Chair of Machine Learning for Robotics (CeTi) (Author)
Michael Kaess - , Carnegie Mellon University (Author)
Joseph Ortiz - , Meta Platforms, Inc. (Author)
Mustafa Mukadam - , Meta Platforms, Inc. (Author)

Abstract

To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object’s pose and shape. The status quo for in-hand perception primarily uses vision and is restricted to tracking a priori known objects. Moreover, visual occlusion of objects in hand is imminent during manipulation, preventing current systems from pushing beyond tasks without occlusion. We combined vision and touch sensing on a multifingered hand to estimate an object’s pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We studied multimodal in-hand perception in simulation and the real world, interacting with different objects via a proprioception-driven policy. Our experiments showed final reconstruction F scores of 81% and average pose drifts of 4.7 millimeters, which was further reduced to 2.3 millimeters with known object models. In addition, we observed that, under heavy visual occlusion, we could achieve improvements in tracking up to 94% compared with vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step toward benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone toward advancing robot dexterity.

Details

Original language	English
Article number	eadl0628
Number of pages	16
Journal	Science Robotics
Volume	9
Issue number	96
Publication status	Published - Nov 2024
Peer-reviewed	Yes

External IDs

PubMed	39536124
ORCID	/0000-0001-9430-8433/work/173989269

Keywords

ASJC Scopus subject areas

General Medicine