Language is grounded in experience. Unlike dictionaries which define words in terms of other words, humans understand many basic words in terms of associations with sensory-motor experiences. People must interact physically with their world to grasp the essence of words like "red," "heavy," and "above." Abstract words are acquired only in relation to more concretely grounded terms. Grounding is thus a fundamental aspect of spoken language, which enables humans to acquire and to use words and sentences in context. We are developing an interactive robot which learns and understands spoken langauge via multisensory grounding and robotic embodiment. The robot is designed with six degrees of freedom and has auditory, visual, proprioceptive, tactile, and balance sensors. This system will serve as a test bed for experiments in acquiring and understanding elementary semantics and syntax of spoken language. Our goals are two-fold. First, we are interested in using computational models to gain insights into how humans process language. By building and testing models with realistic data, we are able to test theories which are difficult to assess using traditional methods based on observation and analysis. Second, we hope to build a new generation of spoken language interfaces with richer semantic representations leading to more intelligent machine behavior.