{"id":1113300,"date":"2024-12-17T18:50:08","date_gmt":"2024-12-18T02:50:08","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=1113300"},"modified":"2025-10-16T23:47:05","modified_gmt":"2025-10-17T06:47:05","slug":"redefining-robot-intelligence-2024-microsoft-research-asia-startrack-scholars-program-accelerates-embodied-ai-and-large-robotics-models","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/redefining-robot-intelligence-2024-microsoft-research-asia-startrack-scholars-program-accelerates-embodied-ai-and-large-robotics-models\/","title":{"rendered":"Redefining robot intelligence: 2025 Microsoft Research Asia StarTrack Scholars Program accelerates embodied\u202fAI\u202fand large robotics models"},"content":{"rendered":"\n<p>With the rapid advancements in AI and robotics, the development of highly intelligent robots capable of seamlessly interacting with the physical environment is becoming increasingly achievable. As the next AI wave, embodied AI innovations promise to revolutionize various industries and significantly impact human life.<\/p>\n\n\n\n<p>Although promising progress has been made, generalist robots and embodied AI are still in their infancy. The team is thrilled to envision this future and are committed to developing cutting-edge foundational robotics models to accelerate its arrival.<\/p>\n\n\n\n<p>Microsoft Research Asia would like to invest in a collaborative effort to explore Embodied AI and Large Action Models. We believe that the research on embodied AI will build a solid foundation and promising development prospect for robot intelligence, benefit human society. If you are an aspiring researcher with a zeal for exploring embodied AI and large action models, we invite you to apply to the Microsoft Research Asia StarTrack Scholars Program. Applications are now open for the 2025 program. For more details and to submit your registration, visit our official website:&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/microsoft-research-asia-startrack-program\/\">Microsoft Research Asia StarTrack Scholars Program \u2013 Microsoft Research<\/a>&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"build-foundation-action-model-for-general-robots\">Build foundation action model for general robots<\/h2>\n\n\n\n<p>Embodied AI is more than a simple fusion of robots with LLMs or VLMs. Beyond language intelligence and cognitive abilities, action intelligence is essential for executing plans and engaging with the physical world. This form of intelligence diverges significantly from language intelligence. For instance, it necessitates dense, dexterous actions and demands high levels of spatial and physical awareness. Our research is focused on creating a new generation of foundational action models that enhance spatial and physical proficiencies in perception, reasoning, and action.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"631\" height=\"330\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image001-67623786b4527.png\" alt=\"Cognition and action intelligence for Embodied AI\" class=\"wp-image-1113303\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image001-67623786b4527.png 631w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image001-67623786b4527-300x157.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image001-67623786b4527-240x126.png 240w\" sizes=\"auto, (max-width: 631px) 100vw, 631px\" \/><figcaption class=\"wp-element-caption\">Cognition and action intelligence for Embodied AI<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"spatial-intelligence-as-the-key-to-action-capabilities\">Spatial intelligence as the key to action capabilities<\/h2>\n\n\n\n<p>The fact that robots operate in a 3D physical world poses unique requirements and challenges. The team believes that spatial intelligence is crucial for developing robust action capabilities. By leveraging advanced 3D computer vision techniques, we aim to provide robots with a deep understanding of their environments. Our work includes pioneering methods for 3D human-object interaction reconstruction, enabling robots to learn from humans and navigating and manipulating the objects with unprecedented generalization and precision.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"crafting-optimal-model-architectures\">Crafting optimal model architectures<\/h2>\n\n\n\n<p>Model architecture design is crucial for advancing the capabilities of embodied AI. The modality of action differs fundamentally from language and vision. Actions are continuous, dense, and require precision in both time and space. Moreover, robotic actions frequently require timely execution to interact effectively with dynamic environments. Effective model architectures must accommodate these demands by providing robust frameworks that integrate real-time sensory feedback with decision-making processes. This integration ensures that robots can adapt their actions swiftly to changes in their surroundings, maintaining elevated levels of performance in complex and unpredictable settings.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"642\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image003-676237d84e697-1024x642.png\" alt=\"Foundation models seamlessly integrating vision, language, and action capabilities\" class=\"wp-image-1113306\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image003-676237d84e697-1024x642.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image003-676237d84e697-300x188.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image003-676237d84e697-768x481.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image003-676237d84e697-240x150.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/image003-676237d84e697.png 1444w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Foundation models seamlessly integrating vision, language, and action capabilities<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"from-spatial-reasoning-to-physical-mastery\">From spatial reasoning to physical mastery<\/h2>\n\n\n\n<p>As robots transition from static observers to dynamic participants in the physical world, achieving physical mastery becomes a pivotal goal. This transformation requires moving beyond mere spatial reasoning to encompass the nuanced skills necessary for interacting with complex environments. Our journey from spatial reasoning to physical mastery involves integrating human-like multimodal sensor fusion into our models. For example, tactile and force sensors are crucial for tasks requiring delicate manipulation and feedback. By embedding these capabilities, we aim to enable robots to perform complex tasks with human-like dexterity and adaptability.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"potential-research-topics-for-startrack-scholars-program\">Potential research topics for StarTrack Scholars Program<\/h2>\n\n\n\n<p>The team invites scholars to explore a range of exciting research topics within the 2025 StarTrack Scholars program, including but not limited to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Vision-Language-Action model architecture design<\/strong><\/li>\n\n\n\n<li><strong>3D human-object-environment reconstruction and understanding<\/strong><\/li>\n\n\n\n<li><strong>Multimodal-sensory intelligence<\/strong><\/li>\n\n\n\n<li><strong>World models and neural robot simulators<\/strong><\/li>\n\n\n\n<li><strong>Dexterous hand manipulation and reinforcement learning<\/strong><\/li>\n\n\n\n<li><strong>Object manipulation benchmarks<\/strong><\/li>\n<\/ul>\n\n\n\n<p>Microsoft Research Asia StarTrack Scholars advocates an open attitude, encouraging dialogue and joint experimentation with researchers from various disciplines to discover viable solutions.&nbsp;Now visit our official website to know more:&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/microsoft-research-asia-startrack-program\/\">Microsoft Research Asia StarTrack Scholars Program \u2013 Microsoft Research<\/a><\/p>\n\n\n\n<p><strong>Theme Team:<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/bainguo\/\">Baining Guo<\/a>, Distinguished Scientist with Microsoft Research<\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/yangjiaolong.github.io\">Jiaolong Yang<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Principal Research Manager, Microsoft Research Asia<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/lisu\/\">Lily Sun<\/a>, Director, Accelerator Microsoft Research Asia<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yalia\/\">Yaobo Liang<\/a>, Senior Researcher, Microsoft Research Asia<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/libei\/\">Bei Liu<\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yalia\/\"><\/a>, Senior Researcher, Microsoft Research Asia<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jianf\/\">Jianlong Fu<\/a>, Senior Research Manager, Microsoft Research Asia<\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/yudeng.github.io\/\">Yu Deng<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Senior Researcher, Microsoft Research Asia<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/fawe\/\">Fangyun Wei<\/a>, Senior Research SDE, Microsoft Research Asia<\/p>\n\n\n\n<p>Lin Luo, Research SDE 2, Microsoft Research Asia<\/p>\n\n\n\n<p>Xi Chen, Research SDE 2, Microsoft Research Asia<\/p>\n\n\n\n<p><strong>References:<\/strong><\/p>\n\n\n\n<p>1. Li, Q., L, Y., et al. (2024). \u201cCogAct-VLA: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation.\u201d arXiv.\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/cogact.github.io\/\">View project webpage<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>2. W, W., W, F., et al. (2024). \u201cUniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping.\u201d arXiv.&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/dexhand.github.io\/UniGraspTransformer\/\">View project webpage<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>3. W. R., X. S., et al. (2024). \u201cMoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision.\u201d arXiv. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/wangrc.site\/MoGePage\/\">View project webpage<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p><em>If you have any questions, please email Ms. Yanxuan Wu, program manager of the Microsoft Research Asia StarTrack Scholars Program, at&nbsp;<\/em><a href=\"mailto:v-yanxuanwu@microsoft.com\"><em>v-yanxuanwu@microsoft.com<\/em><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the rapid advancements in AI and robotics, the development of highly intelligent robots capable of seamlessly interacting with the physical environment is becoming increasingly achievable. As the next AI wave, embodied AI innovations promise to revolutionize various industries and significantly impact human life. Although promising progress has been made, generalist robots and embodied AI [&hellip;]<\/p>\n","protected":false},"author":34512,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":970518,"msr_hide_image_in_river":null,"footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-1113300","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_assoc_parent":{"id":970518,"type":"academic-program"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1113300","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/34512"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1113300\/revisions"}],"predecessor-version":[{"id":1152474,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1113300\/revisions\/1152474"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1113300"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1113300"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1113300"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1113300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}