Interaction Harvesting for Document Retrieval

Despite advances in search technology, few software systems have been developed which accurately categorize multimedia files. The most successful systems for searching images, sounds, or movies rely on keyword annotation to provide meaningful search terms for non-text documents. Unfortunately, such systems usually require the author to enter the keywords manually, a task that is commonly neglected, or is executed poorly. This thesis proposes an approach to document categorization called Interaction Harvesting, wherein systems establish document relationships based on organizational and curatorial cues, harvested from the mouse and click gestures of an online community. Specifically, the spatial and temporal proximity and placement of documents are taken as indicators of document similarity. We propose an expansion technique whereby such proximal documents exert weighted keyword influences on each other. We hypothesize that these approaches will form a document classification framework that relieves some of the difficulty of the annotation process, while providing keyword-equivalent retrieval performance.

