FuseBot: RF-Visual Mechanical Search


Melanie Gonick

Melanie Gonick

FuseBot is a robotic system that can efficiently find and retrieve both RFID tagged and untagged target objects in line-of-sight, non-line-of-sight and occluded settings using RF-Visual perception. The robot fuses both RF (radio frequency) and Visual information from the antenna and camera, mounted on the wrist of the robot, respectively to locate and retrieve a target item. The system introduces two key innovations: RF-Visual Mapping and RF-Visual Extraction to accurately localize and efficiently extract the item of interest.

FuseBot achieves a success rate of 95% in retrieving untagged items, demonstrating for the first time that the benefits of RF perception extend beyond tagged objects in the mechanical search problem.

 Our experimental results demonstrate that FuseBot outperforms state of the art vision based system’s efficiency by more than 40% in terms of the number of actions required for successful mechanical search.

How does FuseBot work?

FuseBot leverages RF and Visual perception to retrieve a target item efficiently. It uses RF signals to locate RFID tags in the environment with centimeter-scale precision. FuseBot integrates a camera and an antenna into its robotic arm and leverages the robotic movements to locate RFIDs, model unknown/occluded regions in the environment, and efficiently extract target items from under a pile independent of whether or not they are tagged with RFIDs.


Signal Kinetics

1. RF-Visual Mapping:

FuseBot first constructs a probabilistic occupancy map of the target item’s location in the pile by fusing information from the robot’s in-hand camera and RF antenna. This component localizes the RFIDs in the pile and applies a conditional (shape-aware) RF kernel to construct a negative 3D probability mask, as shown in the red regions of Fig. b.
By combining this information with its visual observation of the 3D pile geometry (shown Fig.(c)), as well as prior knowledge of the target object’s geometry, FuseBot creates a 3D occupancy distribution, shown as a heatmap in Fig. (d), where red indicates high probability and blue indicates low probability for the target item’s location.

2. RF-Visual Extraction:

After computing the 3D occupancy distribution, FuseBot needs an efficient extraction policy to retrieve the target item. Extraction is a multi-step process that involves removing occluding items and iteratively updating the occupancy distribution map. To optimize this process, we formulate extraction as a minimization problem over the expected number of actions that takes into account the expected information gain, the expected grasp success, and the probability distribution map. To efficiently solve this problem, FuseBot performs depth-based instance segmentation, as shown in Fig.e . The segmentation allows it to integrate the 3D occupancy distribution over each of the object segments, and identify the optimal next-best-grasp. FuseBot keeps decluttering the environment until the target item is seen and retrieved.

This research is sponsored by an NSF CAREER Award (CNS-1844280), the Sloan Research Fellowship, NTT DATA, Toppan, Toppan Forms, and the MIT Media Lab.