How can we make embodied agents have multimodal conversation with individual and group users that is not only natural, but also relational, empathetic, commonsensical, contextual, and meaningfully continuous?
How can we make embodied agents have multimodal conversation with individual and group users that is not only natural, but also relational, empathetic, commonsensical, contextual, and meaningfully continuous?