Dissertation Title: RF-Visual Perception with Applications to Mobile Sensing, Robotics, & Augmented Reality
Abstract:
Mobile and wireless sensing technologies play a critical role in Internet of Things, Augmented Reality (AR), robotics, environmental monitoring, and human health monitoring. Despite their ubiquity, each of today’s mobile sensing technologies is limited in a fundamental way: cameras can capture high-resolution images but are limited to the line of sight and cannot sense behind occlusions; wireless sensing (e.g., using WiFi or Bluetooth) can traverse occlusions but has limited sensing resolution; inertial sensors can track with high accuracy and speed but suffer from spatiotemporal drift.
This thesis introduces algorithms, learning models, and systems to fuse different sensing modalities, particularly radio frequency (RF) and cameras. However, RF and vision are inherently two different sensing modalities, which makes their fusion a complex task. To address this challenge, this thesis leverages and develops advanced signal processing techniques and mathematical modeling that combine different sensing modalities. As a result, it unlocks new capabilities for IoT-connected and mobile systems, such as AR headsets and robots, and enables new perception, interaction, and manipulation tasks.
The thesis has three core components, the first focuses on RF-Visual perception for cyber-physical systems. We present the design, implementation, and evaluation of RFusion, a robotic system that can search for and retrieve RFID-tagged items in line-of-sight, non-line-of-sight, and fully-occluded settings. We then introduce FuseBot, a robotic system that rather than requiring all target items in a pile to be RFID-tagged, leverages the mere existence of an RFID-tagged item in the pile to benefit retrieval of both tagged and untagged items.
The second component focuses on RF-Visual perception for cyber-human systems. We present the design, implementation, and evaluation of X-AR, an AR system with non-line-of-sight perception. X-AR augments AR headsets with RF sensing to enable users to see things that are otherwise invisible to the human eye or to state-of-the-art AR systems. We then explore how to exploit synergies between AR headsets and RFID localization to improve both user experience and localization accuracy. Using fundamental mathematical formulations for RFID localization, we derive confidence metrics and display guidance to the user to improve their experience and enable them to retrieve items faster.
The final component of this thesis answers the question of how we can perform non-line-of-sight perception without any RF-tags altogether. To do this tag-less perception, we use millimeter wave (mmWave) signals and off-the-shelf mmWave radars. We introduce a real-world dataset of mmWave images of everyday objects and an open-source simulation tool that can be used to generate synthetic mmWave images for any 3D triangle mesh. We demonstrate the usefulness of this dataset and simulation tool in multiple perception tasks in non-line-of-sight.
The combination of these three components introduce new sensing modalities that open up important capabilities for a variety of applications in manufacturing, warehousing, logistics, and supply chain as well as Human Computer Interaction (HCI) and Human Robot Interaction (HRI).
Prof. Fadel Adib, Associate Professor, The MIT Media Lab and EECS
Prof. Haitham Hassanieh, Associate Professor, School of Computer and Communication Sciences, The École Polytechnique Fédérale de Lausanne
Dr. Ranveer Chandra, The Vice President in M365 Copilot & The Chief Technology Officer of Agri-Food, Microsoft