{"id":821740,"date":"2022-02-23T12:23:47","date_gmt":"2022-02-23T20:23:47","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=821740"},"modified":"2022-02-23T12:23:47","modified_gmt":"2022-02-23T20:23:47","slug":"a-multimodal-learning-from-observation-towards-all-at-once-robot-teaching-using-task-cohesion","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/a-multimodal-learning-from-observation-towards-all-at-once-robot-teaching-using-task-cohesion\/","title":{"rendered":"A Multimodal Learning-from-Observation Towards All-at-once Robot Teaching using Task Cohesion"},"content":{"rendered":"<p>Multimodal Learning-from-Observation (LfO) is a promising robot teaching solution that enables teaching sequential operations by extracting what-to-do from language and how-to-do from demonstrations. While previous studies have focused on step-by-step instructions, all-at-once teaching allows users to teach the behavior more naturally. However, all-at-once teaching needs to bridge the gap between verbal instruction and robot execution in order to understand which instruction corresponds to which demonstration section. To this end, we introduce the notion of task cohesion, which connects verbal instructions to robot execution based on the concept of physical\/semantic state transition. We solve the problem of grounding and over\/under-segmentation of language and demonstration by considering the cost of recursive dynamic programming, which divides the demonstration and grounds it to the language. The what-to-do can be obtained from the language, and the how-to-do can be obtained by extracting the parameters necessary for the execution of the robot based on the information of the task cohesion from the demonstration segments where the language is grounded. The contributions of this study are as follows: (1) to introduce task cohesion, (2) to propose a recursive dynamic programming approach to align verbal instructions and human demonstration, and (3) to demonstrate the effectiveness of multi-modal all-at-once teaching by integrating them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Multimodal Learning-from-Observation (LfO) is a promising robot teaching solution that enables teaching sequential operations by extracting what-to-do from language and how-to-do from demonstrations. While previous studies have focused on step-by-step instructions, all-at-once teaching allows users to teach the behavior more naturally. However, all-at-once teaching needs to bridge the gap between verbal instruction and robot execution [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"IEEE","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"367","msr_page_range_end":"374","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2022-1-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13562,13554],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[263296],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-821740","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"IEEE","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-1-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"http:\/\/10.1109\/SII52469.2022.9708836","label_id":"243106","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9708836","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"guest","value":"iori-yanokura","user_id":821599,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=iori-yanokura"},{"type":"user_nicename","value":"Naoki Wake","user_id":39916,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Naoki Wake"},{"type":"guest","value":"kazuhiro-sasabuchi","user_id":821605,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=kazuhiro-sasabuchi"},{"type":"guest","value":"riku-arakawa","user_id":821596,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=riku-arakawa"},{"type":"text","value":"Kei Okada","user_id":0,"rest_url":false},{"type":"guest","value":"jun-takamatsu","user_id":821608,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=jun-takamatsu"},{"type":"guest","value":"masayuki-inaba","user_id":821731,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=masayuki-inaba"},{"type":"user_nicename","value":"Katsushi Ikeuchi","user_id":32500,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Katsushi Ikeuchi"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[1057371],"msr_project":[821527],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":821527,"post_title":"Interactive Learning-from-Observation","post_name":"interactive-learning-from-observation","post_type":"msr-project","post_date":"2022-02-24 20:57:14","post_modified":"2025-01-29 14:49:10","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/interactive-learning-from-observation\/","post_excerpt":"Service-robot solutions to empower senior citizens \u200b to achieve more and to enhance their lives The goal of this project is to develop an interactive learning-from-observation (LfO) system in the service-robot domain so as to empower senior citizens to achieve more and enhance their lives. Currently, many seniors in assisted living facilities would have preferred to remain at their homes. If we can use service robots to assist them, they can stay at home, conduct&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/821527"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/821740","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/821740\/revisions"}],"predecessor-version":[{"id":821752,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/821740\/revisions\/821752"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=821740"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=821740"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=821740"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=821740"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=821740"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=821740"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=821740"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=821740"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=821740"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=821740"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=821740"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=821740"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=821740"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}