Project

A Conversational Agent for Dynamic Procedural Interactions

Groups

Instructed learning is ever-present throughout our lives. An element of it, How-To questions, (e.g., “How do I cook rice?”, “How do I write a check?”, or “How do I send pictures to my family from my iPhone?”) is one of the most common queries for search engines [1] and presumably of conversational agents as well. Answers to How-To questions are generally in the form of a procedure, step-by-step instructions that users perform in sequence. However, people find reading instructions cognitively demanding and often prefer that another person guide them through a procedure [2]. Prior work in automating procedural guidance either concentrates on how to communicate instructions or how to reason about procedural knowledge to extract states of entities.  To the best of my knowledge, research attempts have not worked on an end-to-end procedural voice guidance system that would be capable of automatically understanding, generating, and presenting a procedure through a conversational agent. To implement such an agent, I believe that three large gaps need to be overcome: generating a contextual knowledge graph (KG) o… View full description

Instructed learning is ever-present throughout our lives. An element of it, How-To questions, (e.g., “How do I cook rice?”, “How do I write a check?”, or “How do I send pictures to my family from my iPhone?”) is one of the most common queries for search engines [1] and presumably of conversational agents as well. Answers to How-To questions are generally in the form of a procedure, step-by-step instructions that users perform in sequence. However, people find reading instructions cognitively demanding and often prefer that another person guide them through a procedure [2]. Prior work in automating procedural guidance either concentrates on how to communicate instructions or how to reason about procedural knowledge to extract states of entities.  To the best of my knowledge, research attempts have not worked on an end-to-end procedural voice guidance system that would be capable of automatically understanding, generating, and presenting a procedure through a conversational agent. To implement such an agent, I believe that three large gaps need to be overcome: generating a contextual knowledge graph (KG) of a procedure, reasoning on that KG to extract and order necessary information, and constructing a  system that takes the necessary/ordered information and handles conversion and delivery of it into something conversational and easily followed by an end-user.  A system like the one I propose could enhance conversational agents’ ability to interactively respond to How-To questions. This approach would improve upon the existing state-of-affairs where conversational agents hand off the interaction to a web search.  Furthermore, for smart device-related procedures, this kind of mechanism could show inexperienced users how to utilize their devices and could even enable end-user voice programming. Lastly, the intermediate contextual KG representation that this work can generate from a text could enable conversational agents to partake in explainable, informed conversational experiences.  With my prior and current work, along with surveys and analyses in this area, I believe that I can achieve the system proposed in this work and bridge the gap between modern conversational agents and modern procedural understanding systems to build dynamic conversational agents for procedures.