Building AI systems that can process user input, understand it, and generate an engaging and contextually-relevant output in response, has been one of the longest-running goals in AI. Humans use a variety of modalities, such as language and visual cues, to communicate. A major trigger to our meaningful communications are “events” and how they cause/enable future events. In this talk, I will describe my research about modeling human-machine communications around events, with a major focus on commonsense reasoning, world knowledge, and context modeling. I will mainly focus on my ongoing work on context modeling in three different modalities: narrative context for collaborative story generation, visual context for question generation on an eventful image, visual and textual contexts for image-grounded conversations.
See more on this video at www.microsoft.com/en-us/research/video/modeling-context-eventful-human-machine-communications/