{"id":1007064,"date":"2024-02-12T10:46:26","date_gmt":"2024-02-12T18:46:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=1007064"},"modified":"2024-03-08T11:53:07","modified_gmt":"2024-03-08T19:53:07","slug":"embodied-agent-ai","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/embodied-agent-ai\/","title":{"rendered":"Agent AI Towards a Holistic Intelligence"},"content":{"rendered":"<p><span dir=\"ltr\" role=\"presentation\">Recent advancements in large foundational mod<\/span><span dir=\"ltr\" role=\"presentation\">els have remarkably enhanced our understanding <\/span><span dir=\"ltr\" role=\"presentation\">of sensory information in open-world environ<\/span><span dir=\"ltr\" role=\"presentation\">ments. At this pivotal moment, it is crucial to the <\/span><span dir=\"ltr\" role=\"presentation\">AI research trend toward excessive reductionism <\/span><span dir=\"ltr\" role=\"presentation\">and returning to the AI principles inspired by the <\/span><span dir=\"ltr\" role=\"presentation\">holistic philosophy of Aristotle. Specifically, we <\/span><span dir=\"ltr\" role=\"presentation\">emphasize developing \u201cAgent AI\u201d, an embodied <\/span><span dir=\"ltr\" role=\"presentation\">system that integrates large foundation models <\/span><span dir=\"ltr\" role=\"presentation\">into agent actions. The emerging field of Agent <\/span><span dir=\"ltr\" role=\"presentation\">AI spans a wide range of existing embodied and <\/span><span dir=\"ltr\" role=\"presentation\">agent-based multimodal interactions, including <\/span><span dir=\"ltr\" role=\"presentation\">robotics, gaming, and diagnostic systems. We em<\/span><span dir=\"ltr\" role=\"presentation\">phasize the importance of integrating recent large <\/span><span dir=\"ltr\" role=\"presentation\">foundational models to enhance intelligence and <\/span><span dir=\"ltr\" role=\"presentation\">interaction capabilities. Furthermore, we discuss <\/span><span dir=\"ltr\" role=\"presentation\">how agents exhibit remarkable capabilities across <\/span><span dir=\"ltr\" role=\"presentation\">a variety of domains and tasks, challenging our understanding of learning and cognition. This paper we aim to broaden the research community\u2019s perspective on achieving holistic intelligence, while highlighting the need for an integrated approach that considers the agent\u2019s purpose, functionality, and interaction. Finally, we reflect on a deeper discussion of these Agent AI topics from a mainstream and interdisciplinary perspective. This discussion illustrates AI cognition and consciousness within the scope of scientific discourse, and may serves as a basis for future research directions and social influences.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent advancements in large foundational models have remarkably enhanced our understanding of sensory information in open-world environments. At this pivotal moment, it is crucial to the AI research trend toward excessive reductionism and returning to the AI principles inspired by the holistic philosophy of Aristotle. Specifically, we emphasize developing \u201cAgent AI\u201d, an embodied system that [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"arXiv: 2403.00833","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2024-2-12","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13562,13554],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1007064","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-2-12","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2403.00833","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2403.00833.pdf","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2403.00833","label_id":"243118","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2403.00833.pdf","label_id":"243118","label":0}],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":1010664,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/AgentAIposition.pdf"},{"id":1009638,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/AgentAI_p.pdf"},{"id":1008462,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Position_AgentAI.pdf"},{"id":1008459,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/AgentAI_position-65d4cb0de80b2.pdf"},{"id":1008420,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/AgentAI_position.pdf"},{"id":1008417,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Agent_AI_position-65d4b84493079.pdf"},{"id":1007076,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Agent_AI_position.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Qiuyuan Huang","user_id":36356,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Qiuyuan Huang"},{"type":"user_nicename","value":"Naoki Wake","user_id":39916,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Naoki Wake"},{"type":"text","value":"Bidipta Sarkar","user_id":0,"rest_url":false},{"type":"text","value":"Zane Durante","user_id":0,"rest_url":false},{"type":"text","value":"Ran Gong","user_id":0,"rest_url":false},{"type":"text","value":"Rohan Taori","user_id":0,"rest_url":false},{"type":"guest","value":"yusuke-noda","user_id":969939,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=yusuke-noda"},{"type":"guest","value":"demetri-terzopoulos","user_id":969951,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=demetri-terzopoulos"},{"type":"text","value":"Noboru Kuno","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ade Famoti","user_id":43005,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ade Famoti"},{"type":"user_nicename","value":"Ashley Llorens","user_id":39964,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ashley Llorens"},{"type":"user_nicename","value":"John Langford","user_id":32204,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=John Langford"},{"type":"guest","value":"hoi-vo","user_id":969933,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=hoi-vo"},{"type":"guest","value":"fei-fei-li","user_id":969957,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=fei-fei-li"},{"type":"user_nicename","value":"Katsushi Ikeuchi","user_id":32500,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Katsushi Ikeuchi"},{"type":"user_nicename","value":"Jianfeng Gao","user_id":32246,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jianfeng Gao"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144931,668253],"msr_project":[788159],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":788159,"post_title":"Agent AI","post_name":"agent-ai","post_type":"msr-project","post_date":"2023-09-25 21:53:00","post_modified":"2024-02-28 07:03:22","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/agent-ai\/","post_excerpt":"Agent-based multimodal AI systems are becoming a ubiquitous presence in our everyday lives. A promising direction for making these systems more interactive is to embody them as agents within specific environments. The grounding of large foundation models to act as agents within specific environments can provide a way of incorporating visual and contextual information into an embodied system. For example, a system that can perceive user actions, human behavior, environment objects, audio expressions, and the&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788159"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1007064","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":7,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1007064\/revisions"}],"predecessor-version":[{"id":1009701,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1007064\/revisions\/1009701"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1007064"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1007064"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1007064"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1007064"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1007064"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1007064"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1007064"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1007064"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1007064"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1007064"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1007064"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1007064"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1007064"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}