RFusion: Robotic Grasping via RF-Visual Sensing and Learning



Signal Kinetics

RFusion is a robotic system that can search for and retrieve items in line-of-sight, non-line-of-sight, and fully occluded settings. It consists of a robotic arm that has a camera and antenna strapped around its gripper, and it uses both of them to find and retrieve target items.  The robot introduces two new primitives: RF-visual sensing and RF-visual reinforcement learning to efficiently localize, maneuver toward, and grasp target items. 

RFusion is very accurate. It localizes target items with centimeter-scale precision and achieves 96% success rate in retrieving fully occluded objects, even if they are under a pile. Thus, it paves the way for novel robotic retrieval tasks in complex environments such as warehouses, manufacturing plants, and smart homes.

How does RFusion work?

The system leverages off-the-shelf RFID's attached to target items. It starts by using its wrist-mounted antenna to selectively query the RFID on the target item.  It then uses the RFID's measured response to compute the round-trip distance to the tag by leveraging our state-of-the-art RFID positioning technology. Since a single round-trip distance is not sufficient for localization, the robot fuses RF and visual information in order to efficiently localize, maneuver toward, and grasp the target object. It operates in 3 steps.


Signal Kinetics

1) Dense RF-Visual Geometric Fusion:  

Given the round-trip distance to the RFID, the robot maps that distance to a spherical ring centered around the wrist-mounted antenna. Subsequently, it geometrically intersects this spherical ring with the RGB-D data obtained from the wrist-mounted camera, resulting in a list of candidate locations.

2) RF-Visual Reinforcement Learning:

Next, the robot needs to move its gripper to a new location in order to collect new RF and visual measurements. To do this, we trained a reinforcement learning network that uses the history of RF and visual measurements to determine the optimal next vantage point to which the gripper should move. The robot moves to this new location, takes measurements, and repeats this step until it has sufficient confidence in its estimate location of the RFID-tagged item.

3) RF-Visual Grasping:

Once a sufficiently-accurate location has been determined, RFusion uses the location estimate to grasp the object. After grasping, the wrist-mounted antenna can make an additional RFID measurement to verify that the target item has indeed been successfully grasped, and if not, attempt more grasps until the item of interest has been picked up.

How well does RFusion work?

Our evaluation demonstrates that Fusion  localizes target items with centimeter-scale accuracy and achieves 96% success rate in retrieving fully occluded objects, even if they are under a pile. 

This research is sponsored by an NSF CAREER Award (CNS-1844280), the Sloan Research Fellowship, NTT DATA, Toppan, Toppan Forms, the MITMedia Lab, and the Abdul Latif Jameel Water and Food Systems Lab (J-WAFS) at MIT.