{"id":927303,"date":"2023-03-16T20:16:37","date_gmt":"2023-03-17T03:16:37","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=927303"},"modified":"2024-03-20T08:14:15","modified_gmt":"2024-03-20T15:14:15","slug":"gpt-models-meet-robotic-applications-co-speech-gesturing-chat-system","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/gpt-models-meet-robotic-applications-co-speech-gesturing-chat-system\/","title":{"rendered":"GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"720\" height=\"550\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/chatbot.gif\" alt=\"Robot is chatting with a user with movements.\" class=\"wp-image-927501\" \/><figcaption class=\"wp-element-caption\">Our robotic gesture engine and DIY robot, MSRAbot, are integrated with a GPT-based chat system.<\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-fill-download\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/GPT_gesture.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a><\/div>\n\n\n\n<div class=\"wp-block-button is-style-fill-github\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/github.com\/microsoft\/LabanotationSuite\/tree\/master\/MSRAbotSimulation\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub (for MSRAbot)<\/a><\/div>\n\n\n\n<div class=\"wp-block-button is-style-fill-github\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/github.com\/microsoft\/GPT-Enabled-HSR-CoSpeechGestures\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub (for Toyota HSR)<\/a><\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"introduction\">Introduction<\/h3>\n\n\n\n<p>Large-scale language models have revolutionized natural language processing tasks, and researchers are exploring their potential for enhancing human-robot interaction and communication. In this post, we will present our co-speech gesturing chat system, which integrates GPT-3\/ChatGPT with a gesture engine to provide users with a more flexible and natural chat experience. We will explain how the system works and discuss the synergistic effects of integrating robotic systems and language models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"co-speech-gesturing-chat-system-how-it-works\">Co-Speech Gesturing Chat System: How it works<\/h3>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"583\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/overview-1024x583.png\" alt=\"diagram\" class=\"wp-image-940248\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/overview-1024x583.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/overview-300x171.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/overview-768x437.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/overview-1536x874.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/overview-2048x1165.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/overview-240x137.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">The pipeline of the co-speech gesture generation system.<\/figcaption><\/figure>\n\n\n\n<p>Our co-speech gesturing chat system operates within a browser. When a user inputs a message, GPT-3\/ChatGPT generates the robot&#8217;s textual response based on a prompt carefully crafted to create a chat-like experience. The system then utilizes a gesture engine to analyze the text and select an appropriate gesture from a library associated with the conceptual meaning of the speech. A speech generator converts the text into speech, while a gesture generator executes co-speech gestures, providing audio-visual feedback expressed through a CG robot. The system leverages various Azure services, including Azure Speech Service for speech-to-text conversion, Azure Open AI service for GPT-3-based response generation, and Azure Language Understanding service for concept estimation.  The source code of the system is available on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/LabanotationSuite\/tree\/master\/MSRAbotChatSimulation\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"msrabot-diykit\">MSRAbot DIYKit<\/h3>\n\n\n\n<p>In this post, we have utilized our in-house developed robot named MSRAbot, originally designed for a platform for human-robot interaction research. As an additional resource for readers interested in the robot, we have developed and open-sourced a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/gestureBotDesignKit\" target=\"_blank\" rel=\"noopener noreferrer\">DIYKit for MSRAbot<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. This DIYKit includes 3D models of the parts and step-by-step assembly instructions, enabling users to build the robot&#8217;s hardware using commercially available items. The software needed to operate the robot is also available on the same page.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"320\" height=\"320\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/03\/MSRAbot_sample.gif\" alt=\"MSRAbot hardware is moving.\" class=\"wp-image-927954\" \/><figcaption class=\"wp-element-caption\">MSRAbot hardware. Visit our GitHub page for more information.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"the-benefits-of-integrating-robotic-systems-and-language-models\">The Benefits of Integrating Robotic Systems and Language Models<\/h3>\n\n\n\n<p>The fusion of existing robot gesture systems with large-scale language models has positive effects for both components. Traditionally, studies on robot gesture systems have used predetermined phrases for evaluation. The integration with language models enables evaluation under more natural conversational conditions, which promotes the development of superior gesture generation algorithms. On the other hand, large-scale language models can expand the range of expression by adding speech and gestures to their excellent language responses. By integrating these two technologies, we can develop more flexible and natural chat systems that enhance human-robot interaction and communication. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"challenges-and-limitations\">Challenges and Limitations<\/h3>\n\n\n\n<p>While our co-speech gesturing chat system appears straightforward and promising, it also encounters limitations and challenges. For example, the use of language models poses risks associated with language models, such as generating biased and inappropriate responses. Additionally, the gesture engine and concept estimation must be reliable and accurate to ensure the overall effectiveness and usability of the system. Further research and development are needed to make the system more robust, reliable, and user-friendly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h3>\n\n\n\n<p>In conclusion, our co-speech gesturing chat system represents an exciting advance in the integration of robotic systems and language models. By using a gesture engine to analyze speech text and integrating GPT-3 for response generation, we have created a chat system that offers users a more flexible and natural chat experience. As we continue to refine and develop this technology, we believe that the fusion of robotic systems and language models will lead to more sophisticated and beneficial systems for users, such as virtual assistants and tutors.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"about-our-research-group\">About our research group<\/h3>\n\n\n\n<p>Visit our homepage: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/applied-robotics-research\/\">Applied Robotics Research<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"learn-more-about-this-project\">Learn more about this project<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/gesture-generation-for-service-robots\/\">[Project page] Gesture Generation for Service Robots<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/labeling-the-phrases-of-a-conversational-agent-with-a-unique-personalized-vocabulary\/\" target=\"_blank\" rel=\"noreferrer noopener\">[Paper] Labeling the Phrases of a Conversational Agent with a Unique <\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ieeexplore.ieee.org\/document\/9708605\" target=\"_blank\" rel=\"noopener noreferrer\">Personalized <span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/labeling-the-phrases-of-a-conversational-agent-with-a-unique-personalized-vocabulary\/\" target=\"_blank\" rel=\"noreferrer noopener\">Vocabulary<\/a><\/li>\n\n\n\n<li>[<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9708837\">Paper] Integration of Gesture Generation System Using Gesture Library with DIY Robot Design Kit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/1905.08702\" target=\"_blank\" rel=\"noopener noreferrer\">[Paper] Design of conversational humanoid robot based on hardware independent gesture generation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/GPT-Enabled-HSR-CoSpeechGestures\">[GitHub] Sample code to test co-speech gestures using Toyota HSR robot <span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n\n\n<p><!-- \/wp:post-content --><\/p>","protected":false},"excerpt":{"rendered":"<p>Large-scale language models have revolutionized natural language processing tasks, and researchers are exploring their potential for enhancing human-robot interaction and communication. In this post, we will present our co-speech gesturing chat system, which integrates GPT-3\/ChatGPT with a gesture engine to provide users with a more flexible and natural chat experience. We will explain how the [&hellip;]<\/p>\n","protected":false},"author":39916,"featured_media":927597,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":668253,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-927303","msr-blog-post","type-msr-blog-post","status-publish","has-post-thumbnail","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":668253,"type":"group"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/927303","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39916"}],"version-history":[{"count":39,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/927303\/revisions"}],"predecessor-version":[{"id":1016820,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/927303\/revisions\/1016820"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/927597"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=927303"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=927303"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=927303"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=927303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}