{"id":966927,"date":"2023-09-07T17:49:22","date_gmt":"2023-09-08T00:49:22","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=966927"},"modified":"2024-03-25T04:54:43","modified_gmt":"2024-03-25T11:54:43","slug":"mindagent-emergent-gaming-interaction","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/mindagent-emergent-gaming-interaction\/","title":{"rendered":"MindAgent: Emergent Gaming Interaction"},"content":{"rendered":"<p><span dir=\"ltr\" role=\"presentation\">Large Language Models (LLMs) have the capacity of performing complex <\/span><span dir=\"ltr\" role=\"presentation\">scheduling in a multi-agent system and can coordinate these agents into com<\/span><span dir=\"ltr\" role=\"presentation\">pleting sophisticated tasks that require extensive collaboration. However, despite <\/span><span dir=\"ltr\" role=\"presentation\">the introduction of numerous gaming frameworks, the community has insufficient <\/span><span dir=\"ltr\" role=\"presentation\">benchmarks rather than building general multi-agents collaboration infrastructure <\/span><span dir=\"ltr\" role=\"presentation\">that encompass both LLM and human-NPCs communications. In this work, we <\/span><span dir=\"ltr\" role=\"presentation\">propose a novel infrastructure &#8211;<\/span> <span dir=\"ltr\" role=\"presentation\">MindAgent<\/span> <span dir=\"ltr\" role=\"presentation\">&#8211; to evaluate planning and coordina<\/span><span dir=\"ltr\" role=\"presentation\">tion emergent capabilities for gaming interaction. In particular, our infrastructure <\/span><span dir=\"ltr\" role=\"presentation\">leverages existing gaming framework to require understanding of the coordina<\/span><span dir=\"ltr\" role=\"presentation\">tor for a considerable multi-agents, collaborate with human players via un-<\/span><span dir=\"ltr\" role=\"presentation\">finetuned proper instructions, and establish an in-context learning with feedback <\/span><span dir=\"ltr\" role=\"presentation\">on few-shot prompt way. Furthermore, we introduce<\/span> <span dir=\"ltr\" role=\"presentation\">CuisineWorld<\/span><span dir=\"ltr\" role=\"presentation\">, a new gam<\/span><span dir=\"ltr\" role=\"presentation\">ing scenario and related benchmark that dispatch a multi-agent collaboration effi<\/span><span dir=\"ltr\" role=\"presentation\">ciency and supervise multiple agents playing the game simultaneously. We con<\/span><span dir=\"ltr\" role=\"presentation\">duct comprehensive evaluations with new auto-metric<\/span> <span dir=\"ltr\" role=\"presentation\">CoS<\/span> <span dir=\"ltr\" role=\"presentation\">for calculating the col<\/span><span dir=\"ltr\" role=\"presentation\">laboration efficiency. Finally, our infrastructure can be deployed into real-world <\/span><span dir=\"ltr\" role=\"presentation\">gaming scenarios in a customized VR game \u201dCuisineWorld\u201d and adapted in exist<\/span><span dir=\"ltr\" role=\"presentation\">ing border gaming \u201dMinecraft\u201d domain. We hope our findings on LLMs and the <\/span><span dir=\"ltr\" role=\"presentation\">new infrastructure for general-purpose scheduling and coordination can help shed <\/span><span dir=\"ltr\" role=\"presentation\">light on how such skills can be obtained by learning from large text corpora.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks rather than building general multi-agents collaboration infrastructure that encompass both LLM and human-NPCs communications. [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"Generated the model and infrastructure with Microsoft Gaming US.","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Processings of NAACL 2024","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2024-1-9","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-966927","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-1-9","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"Generated the model and infrastructure with Microsoft Gaming US.","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/MindAgent_final.pdf","id":"980802","title":"mindagent_final","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2309.09971.pdf","label_id":"243112","label":0},{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/MindAgent_final.pdf","id":"980802","title":"mindagent_final","label_id":"243118","label":0}],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":980802,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/MindAgent_final.pdf"},{"id":969909,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/MindAgent-650dee0f7fd72.pdf"},{"id":967923,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/GamingInteraction.pdf"},{"id":967017,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/MindAgent-64ff093a498c7.pdf"},{"id":966945,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/MindAgent-64fd883912140.pdf"},{"id":966942,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/MindAgent-64fd2b9066ac8.pdf"},{"id":966930,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/MindAgent.pdf"}],"msr-author-ordering":[{"type":"text","value":"Steven Gong","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Qiuyuan Huang","user_id":36356,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Qiuyuan Huang"},{"type":"text","value":"Xiaojian Ma","user_id":0,"rest_url":false},{"type":"text","value":"Hoi Vo","user_id":0,"rest_url":false},{"type":"text","value":"Zane Durante","user_id":0,"rest_url":false},{"type":"text","value":"Yusuke Noda","user_id":0,"rest_url":false},{"type":"text","value":"Zilong Zheng","user_id":0,"rest_url":false},{"type":"text","value":"Song-chun Zhu","user_id":0,"rest_url":false},{"type":"text","value":"Demetri Terzopoulos","user_id":0,"rest_url":false},{"type":"text","value":"Feifei Li","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Jianfeng Gao","user_id":32246,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jianfeng Gao"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144931],"msr_project":[788159,965577],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":788159,"post_title":"Agent AI","post_name":"agent-ai","post_type":"msr-project","post_date":"2023-09-25 21:53:00","post_modified":"2024-02-28 07:03:22","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/agent-ai\/","post_excerpt":"Agent-based multimodal AI systems are becoming a ubiquitous presence in our everyday lives. A promising direction for making these systems more interactive is to embody them as agents within specific environments. The grounding of large foundation models to act as agents within specific environments can provide a way of incorporating visual and contextual information into an embodied system. For example, a system that can perceive user actions, human behavior, environment objects, audio expressions, and the&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788159"}]}},{"ID":965577,"post_title":"Emergent Interaction Agent","post_name":"gaming-interaction","post_type":"msr-project","post_date":"2023-05-22 22:38:00","post_modified":"2023-12-17 10:09:27","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/gaming-interaction\/","post_excerpt":"We collaborate with X-Box and Mesh team, explored a new gaming infrastructure and designed the dynamic real-time system for human-player and NPCs with GPT-X in the multi-agent platform. GitHub: MindAgent (opens in new tab) ArXiv: https:\/\/arxiv.org\/abs\/2309.09971 (opens in new tab) Demo: MindAgent.mp4 (opens in new tab) Gaming Interaction Infrastructure: We are very excited to share the good news. Our project \u201cMindAgent: Emergent Gaming Interaction (opens in new tab)\u201d is public recently. We seek to develop&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/965577"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/966927","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":9,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/966927\/revisions"}],"predecessor-version":[{"id":1015521,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/966927\/revisions\/1015521"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=966927"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=966927"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=966927"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=966927"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=966927"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=966927"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=966927"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=966927"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=966927"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=966927"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=966927"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=966927"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=966927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}