{"id":717757,"date":"2021-01-18T01:51:52","date_gmt":"2021-01-18T09:51:52","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=717757"},"modified":"2021-01-18T01:51:52","modified_gmt":"2021-01-18T09:51:52","slug":"interacting-with-instructional-videos-using-knowledge-supervision","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/interacting-with-instructional-videos-using-knowledge-supervision\/","title":{"rendered":"Interacting with Instructional Videos using Knowledge Supervision"},"content":{"rendered":"<p>More and more people are relying on instructional videos, such as cooking videos, to teach themselves skills with \u201chow to\u201d instructions. An important research question is then, in place of watching the entire video, can we provide a type of interactive access to make learning more effective? Figure 1 illustrates how videos are divided into segments, allowing users to skip to the answer segment for the given question. This question is becoming more important, as people are relying more heavily on videos to teach themselves skills during the COVID-19 pandemic.<\/p>\n<div id=\"attachment_717763\" style=\"width: 610px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-717763\" class=\"wp-image-717763\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-1.png\" alt=\"\" width=\"600\" height=\"317\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-1.png 339w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-1-300x158.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-1-16x8.png 16w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><p id=\"caption-attachment-717763\" class=\"wp-caption-text\">Figure 1. This figure demonstrates a scenario where a question is asked on an instructional video, and the video is skipped directly to a related segment for effective teaching. Currently, YouTube relies on manual annotation for this feature.<\/p><\/div>\n<div id=\"attachment_717766\" style=\"width: 2485px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-717766\" class=\"size-full wp-image-717766\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2.png\" alt=\"\" width=\"2475\" height=\"498\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2.png 2475w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2-300x60.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2-1024x206.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2-768x155.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2-1536x309.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2-2048x412.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-2-16x3.png 16w\" sizes=\"auto, (max-width: 2475px) 100vw, 2475px\" \/><p id=\"caption-attachment-717766\" class=\"wp-caption-text\">Figure 2. Research challenges for asking \u201chow to\u201d questions, known as non-factoid questions with long answers for text and video modality.<\/p><\/div>\n<p>Professor Seung-won Hwang from Yonsei University in South Korea and his collaborators, Microsoft Research Asia\u2019s Nan Duan and Lei Ji, and Yonsei PhD students, Kyungjae Lee, Hojae Han, and Seungtaek Choi, have been investigating this topic. Despite active research conducted on Question Answering (QA), answering \u201chow to\u201d questions is an under-studied topic, and existing solutions targeting factoid QA cannot satisfactorily answer \u201chow to\u201d questions with long answers, especially for multi-modal QA. Research challenges are summarized in Figure 2, which show how to support non-factoid QA for text and video modality. For text modality, the researchers found that the length difference between short questions and long answers hinders neural matching between two representations. They addressed this issue using the MICRON framework (published at EMNLP 2019), which combines the advantages of representation- and interaction-based matching. Their proposed model combines contextual representations and also considers interactions between n-grams of multiple granularities.<\/p>\n<p>Their finding is extended to video QA. For this, there are two additional challenges (marked in red in Figure 2). Though documents are often semantically pre-segmented by paragraphs, segmenting them into answer units itself is a challenge in video QA. In addition, videos have multimodality, where semantics represented in both text and video modalities need to be aggregated. The researchers proposed a two-stage method, where Segmenter uses multimodal-BERT to generate segmentation candidates by predicting likely start\/end positions. Ranker then prioritizes candidates based on length-adaptive gating to overcome the length gap challenge discussed above. This idea is described in an AAAI 2020 paper entitled \u201cSegment-then-Rank: Non-factoid Question Answering on Instructional Videos.\u201d See Figure 3.<\/p>\n<div id=\"attachment_717769\" style=\"width: 1617px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-717769\" class=\"wp-image-717769 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-3.png\" alt=\"diagram\" width=\"1607\" height=\"616\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-3.png 1607w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-3-300x115.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-3-1024x393.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-3-768x294.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-3-1536x589.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-3-16x6.png 16w\" sizes=\"auto, (max-width: 1607px) 100vw, 1607px\" \/><p id=\"caption-attachment-717769\" class=\"wp-caption-text\">Figure 3. Segment-then-rank framework for non-factoid video QA<\/p><\/div>\n<p>The team is continuing efforts to extract key segments from a full video, as Figure 4 illustrates. Existing video summarization extracts highlights but cannot support chaptering. The distinction with the current research is that it seeks to leverage external knowledge, such as recipes, to annotate key steps, which would help students discover key questions to ask for learning. A demo has been published at ECML 2020, available at: http:\/\/pcdeepred.yonsei.ac.kr\/IVSKS\/<\/p>\n<div id=\"attachment_717772\" style=\"width: 610px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-717772\" class=\"wp-image-717772\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-4.png\" alt=\"graphical user interface, application\" width=\"600\" height=\"598\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-4.png 622w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-4-300x300.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-4-150x150.png 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-4-12x12.png 12w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-4-180x180.png 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/interacting-with-instructional-videos-using-knowledge-supervision-4-360x360.png 360w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><p id=\"caption-attachment-717772\" class=\"wp-caption-text\">Figure 4. Grounding video with procedural knowledge for automated chaptering.<\/p><\/div>\n<p>[1] <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/support.google.com\/youtube\/answer\/9884579?hl=en\">https:\/\/support.google.com\/youtube\/answer\/9884579?hl=en<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (credit: Figure 1 and 4)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>More and more people are relying on instructional videos, such as cooking videos, to teach themselves skills with \u201chow to\u201d instructions. An important research question is then, in place of watching the entire video, can we provide a type of interactive access to make learning more effective? Figure 1 illustrates how videos are divided into [&hellip;]<\/p>\n","protected":false},"author":34512,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":199560,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-717757","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":199560,"type":"lab"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/717757","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/34512"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/717757\/revisions"}],"predecessor-version":[{"id":717778,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/717757\/revisions\/717778"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=717757"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=717757"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=717757"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=717757"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}