Research challenges
Research objectives and eligibility requirements vary by challenge. Some challenges accept proposals from PhD students, others from faculty, and some from both. Review the descriptions below to understand each challenge’s research goals and specific eligibility criteria.
AI for global and societal impact
-
Principal Investigator(s): Cecily Morrison (Microsoft Research Cambridge, UK), Katja Hofmann (Microsoft Research Cambridge, UK)
Additional Microsoft collaborators: Nancy Baym (Microsoft Research New England)
Summary:
This challenge explores the model capabilities and human scaffolds needed to enable creative professionals across geographies to meaningfully use and adopt interactive generative AI (interactive GenAI), including worlds models and multimodal models.
Description:
Interactive GenAI models generate novel virtual experiences, enabling users to move around and interact with spaces and artefacts that are generated in real time. It is now possible for media creatives to shape these experiences through the provision and curation of datasets for training, fine-tuning or prompting such models. Yet, we do not know what kinds of model capabilities and human scaffolds are needed to make novel interactive AI technologies useful for people in the creative media industry in ways that are equitable from the beginning. This project will explore this question through the concrete design brief of creating ‘day-in-the-life’ interactive GenAI experiences alongside disability advocacy organizations and the technical team developing interactive GenAI technologies. The resulting experiences will be released for public use, illustrating how people can shape AI.
Ideal collaborator:
We are looking for a creative artist who has technical skills to work with early stage pre-consumer technologies that will be a compelling demonstration of the possible for the general public. The candidate should have experience working with marginalized communities as well as enough technical expertise to interact fluidly with the technical team. Experience training and/or fine-tuning generative AI models is ideal.
Eligible candidates:
PhD students, faculty, and non-academia based proposals
(Note: This research challenge is uniquely open to proposals from non-academic candidates. To submit, select “Faculty” in the portal in response to the profession field. Enter N/A where a student name is requested. We’ll recategorize your proposal on our end.)
-
Principal Investigator(s): Eleanor Dillon (Microsoft Research New England), Lindsey Raymond (Microsoft Research New England)
Additional Microsoft collaborators: Eric Horvitz (Office of the Chief Scientific Officer)
Summary:
Combine firm-level data of GitHub Copilot adoption with detailed outside data on firms’ employment, hiring, and performance to understand the impact of generative AI adoption on firm composition and employee outcomes, with a particular focus on off-shoring SDE work and differential impacts across countries.
Description:
India produces about 1.5 million engineers every year, but survey and anecdotal evidence shows that many of them lack employable skills, including writing error-free code. For such workers, AI can potentially provide a powerful productivity increase, but also could substitute for some of their skills, altering the task composition of their work and consequently impacting wages. These impacts would likely vary by worker tenure as well as firm organizational structure. Furthermore, AI could help close the productivity gap between developed and developing country firms in certain sectors, but capital or managerial constraints may impede adoption of AI technologies. This project would use Github and other data to conduct exploratory research into potential interventions that would quantify how to optimize the introduction of AI tools into firms in India. We believe findings from this project could provide useful insights into firm decisions in developing countries broadly, given similar questions regarding employability of workers engaged in such occupations.
Ideal collaborator:
This challenge is looking for a mid-career faculty member with expertise in the economics of organizations, technology adoption, and personnel management to collaborate on this research.
Eligible candidates:
Faculty proposals
-
Principal Investigator(s): Jacki O’Neill (Microsoft Research Africa), Tanuja Ganu (Microsoft Research India), Xing Xie (Microsoft Research Asia)
Additional Microsoft collaborators: Ogbemi Ekwejunor-Etchie (Microsoft Research Accelerator)
Summary:
The Global South AI Grand Challenge invites researchers to shape the future of AI from the Global South outward by defining new benchmarks, modalities, and solutions that drive progress on a global scale.
Description:
Artificial intelligence is reshaping how humanity learns, creates, and connects, yet its foundations remain incomplete, built on limited data, languages, and cultural perspectives. The Global South AI Grand Challenge invites researchers to expand the frontiers of AI by building from the majority of humanity outward.
In partnership with Microsoft Research Africa, Microsoft Research India, Microsoft Research Asia and the Microsoft Research Accelerator, this challenge invites bold exploration in foundation and generative AI through novel datasets, architectures, training approaches, or model optimization for long-tailed data, all rooted in the cultural and local contexts of the Global South. Proposals may explore multilingual and multimodal intelligence, community-scale datasets, or novel evaluation frameworks that make AI more inclusive, adaptable, and trustworthy.
Selected teams will collaborate with Microsoft researchers and engineers to test, refine, and scale their work on global platforms, advancing the shared goal of building AI systems that reflect and serve all of humanity.
Ideal collaborator:
We seek faculty with a strong track record of innovation in foundation models and generative AI, particularly those advancing video generation or model optimization for long tailed or low resource data. Preference will be given to researchers with a solid AI or ML background and demonstrated experience in cross-disciplinary collaboration that bridges technical and societal impact.
Eligible candidates:
Faculty proposals
-
Principal Investigator(s): Abi Sellen (Microsoft Research Cambridge UK), Eric Horvitz (Office of the Chief Scientific Officer)
Additional Microsoft collaborators: Andrew Jenks (Microsoft), Jessica Young (Microsoft), and Sam Vaughn (Microsoft)
Summary:
This challenge explores how we can use human-centric design to put the latest provenance tools (such as fingerprinting, watermarking, and cryptographic provenance technology) into the hands of users to allow them to understand and explore the source, history, and veracity of the online content they are interacting with.
Description:
Tackling disinformation and misinformation is a growing and critical challenge. The rise of the generative capabilities of AI technologies to create and manipulate content threatens to usher in a “post-epistemic world,” where fiction cannot be distinguished from reality. Microsoft has been a leader in the development of technical provenance tools. However, neither we nor others have invested enough in improving end-user experiences and understandings— or in assessing the effectiveness of designs for deploying these technical solutions. We seek a fellow to coordinate and focus cross-company efforts, while building connections to the broader ecosystem of organizations, on iterative human-centric efforts to build and evaluate methods for communicating provenance of content, with a specific focus on their effectiveness in improving end-user understanding, insight, and interpretation.
The weakest link in helping end users understanding whether the content they are viewing was captured by cameras and microphones, produced through human effort, or generated wholly or partially by AI technologies is people’s grasp of AI methods and their capabilities to create content and manipulate content, the intentions of content creators in different contexts, and the technical methods for marking content and encoding metadata.
Thus, research on directions with managing disinformation and misinformation must be explicitly human-centered. A sociotechnical approach to the problem is critical.
Such an approach starts with the assumption that users need access to provenance information about digital content in order to make sound judgements about what they are engaging with online. Rather than attaching simple labels to content (whether it be images, video, audio, or documents), the decision about whether something has been altered in ways that distort, deceive, or undermine its use, is personal, complex and highly contextual. We need social science research to understand the range of circumstances in which provenance information is important, the ways in which it is currently perceived or misperceived, and the factors that come into play when making these decisions. We also need design research to experiment with different ways that provenance information might be surfaced, and this effort will require iterative efforts with designing and testing different approaches.
The aspiration is to design end user tools that would create new conventions across media to empower users to query and understand the origin of content and any alterations that have occurred, whether human or machine generated.
Ideal collaborator:
This challenge is open to proposals from faculty, postdocs, and PhD students. We’re looking for individuals with interdisciplinary experience in one or more of human-computer interaction, design, media and communications studies, security, and policy.
Eligible candidates:
PhD student, postdoc, and faculty proposals
AI fundamentals: scalable reasoning, model adaption and evaluation
-
Principal Investigator(s): Alex Chouldechova (Microsoft Research NYC), Xiaoyuan Yi (Microsoft Research Asia)
Additional Microsoft collaborators: Miro Dudik (Microsoft Research NYC), Xing Xie (Microsoft Research Asia), Sociotechnical Alignment Center (Microsoft Research NYC), Societal AI group (Microsoft Research Asia)
Summary: This project aims to further bridge psychometrics and AI to develop a new science of Generative AI evaluation that moves beyond benchmarks toward interpretable, generalizable measures of model behavior.
Description:
Generative AI (GenAI) evaluation today relies heavily on benchmarks and leaderboards. Yet despite the proliferation of benchmarks purporting to capture diverse model capabilities and safety risks, it remains under what—if anything—benchmark scores individually or collectively tell us that generalizes beyond the specific tests on which they are reported. This research challenge seeks to catalyze a paradigm shift for GenAI evaluation by adapting and extending modern psychometric methods to meet the demands of this new domain. Through this challenge we aim to reconceptualize GenAI model capabilities, safety risks, and values as latent attributes of models that drive—and hence are discoverable through—observed AI behavior across diverse settings and use cases. This research challenge goes beyond simply applying existing psychometric frameworks and methods. Whereas data traditionally studied in psychometrics involves many test takers and a relatively small number of questions (a “low-dimensional” regime), in GenAI evaluation we have relatively few models which we subject to a battery of hundreds of thousands of questions (a “high-dimensional” regime). The challenge aims to enhance existing psychometric methods through modern statistical machine learning to develop methods that are performant in the high-dimensional regime.
Ideal collaborator:
The ideal collaborators for this project would be a faculty member and PhD student with deep expertise in psychometrics, measurement, generative AI evaluation, and associated methods and theory from statistics and machine learning.
Eligible candidates:
Faculty and postdoc proposals
-
Principal Investigator(s): Gaurav Sinha (Microsoft Research India), Kiran Shiragur (Microsoft Research India), and Shivam Garg (Microsoft Research AI Frontiers)
Additional Microsoft collaborators: Arun Iyer (Microsoft Research India), Sonu Mehta (Microsoft Research India)
Summary:
This challenge aims to develop novel mechanisms for post-training retrieval models using high quality feedback from reward models (such as LLM based cross encoders) to optimize downstream retrieval and/or generation (RAG) performance.
Description:
Post-training has emerged as a powerful technique for steering language models toward maximizing desired rewards. This approach presents a significant opportunity for retrieval; a domain often constrained by sparse training data (i.e., lacking relevance signals for most query-document pairs). The availability of high-quality reward models, such as LLM-based cross-encoders or human feedback, makes this avenue particularly promising.
We intend to develop and adapt post-training techniques specifically for retrieval models, targeting applications like Search, Advertising, and Retrieval-Augmented Generation (RAG).
Some of our immediate research questions include:
(1) Data Selection: Which queries and corresponding documents should be selected for scoring by reward models during the post-training stage to maximize downstream retrieval and RAG performance?
(2) Loss Formulation: What is the optimal post-training loss function given a specific retrieval architecture, reward feedback design (e.g., pointwise, pairwise, listwise), and application scenario?
(3) Computational Efficiency: How can we efficiently execute multiple post-training iterations, especially when the reward models are large and computationally expensive?
Ideal collaborator:
Ideal collaborators for this project include PhD students (and faculty) working in machine learning with a strong focus on reinforcement learning or information retrieval with an exposure to both theoretical and empirical research in these areas. Prior hands-on experience in working with large language models specifically in developing RL based post training algorithms will be extremely valuable.
Eligible candidates: PhD students and faculty
-
Principal Investigator(s): Amit Sharma (Microsoft Research India), Nagarajan Natarajan (Microsoft Research India), Niranjani Prasad (Microsoft Research Cambridge UK), Sushrut Karmalkar (Microsoft Research Cambridge UK)
Additional Microsoft collaborators: Alicia Curth (Microsoft), Vineeth Balasubramanian (Microsoft Research India)
Summary:
This challenge aims to advance knowledge-guided inference—where external verifiers dynamically steer model generation at test time—to unlock scalable, reliable reasoning in structured and high-stakes domains.
Description:
Scaling test-time compute has emerged as a key paradigm for reasoning in structured domains like math and code. This challenge explores new scaling dimensions through knowledge-guided inference—a new paradigm where external verifiers (e.g., neuro-symbolic) and structured knowledge (e.g., domain-specific validators) actively steer model tokens during generation, going beyond the standard post-hoc verification paradigm. By scaling test-time compute through dynamic guidance, we aim to unify statistical learning with symbolic verification, enabling branching and backtracking correction during inference. A key sub-challenge is designing verifiers that can operate on ambiguous natural language, extending beyond formal domains like mathematics and code. This capability is essential for deploying reasoning models in high-stakes domains such as law and healthcare, where interpretability and reliability are critical.
The collaboration aims to propose new verifier-guided architectures, training strategies for building verifiers in ambiguous scenarios, and practical mechanisms for deciding when to invoke external tools versus rely on internal reasoning. Ultimately, this work aims to redefine how reasoning models are built and deployed—making them more adaptable, efficient, and trustworthy.
Ideal collaborator:
Ideal collaborators for this project include PhD students (or faculty) in machine learning, natural language processing, or symbolic AI, with a strong focus on reasoning, verification, or neuro-symbolic methods. Expertise in reinforcement learning, building efficient AI inference systems, or program synthesis would be especially valuable.
Eligible candidates:
PhD student and faculty proposals
Biological and scientific modeling
-
Principal Investigator(s): Alex Lu (Microsoft Research New England), Kevin Yang (Microsoft Research New England), Lorin Crawford (Microsoft Research New England)
Additional Microsoft collaborators: Sarah Alamdari (Microsoft), Carles Domingo-Enrich (Microsoft), and Ashley Conard (Health Futures)
Summary: Regulatory elements in non-coding DNA are critical to human health and genetic variation: can we use generative AI to understand and design them?
Description:
While generative models for biological sequence modalities (proteins, DNA) have exploded, the design of regulatory DNA remains elusive, due to the complex way these sequences interact in systems and their still-poorly understood nature. Unlocking this understanding would greatly expand human health, synthetic biology, and other applications, as it would allow for fine-grained control and understanding of the cell contexts genes in which are expressed. We wish to investigate if generative models of non-coding regulatory DNA can help us understand and design these elements. Themes of interest for us would include but not be limited to controlling cell type and state dependent regulation, applications to comparative genomics and bioinformatics, and design of de novo elements with properties not seen before in nature.
Ideal collaborator:
The ideal collaborator would be a current faculty member who would be interested in using generative models to design and understand regulatory elements. We are seeking collaborators with a background in or related to regulatory genomics. Access to and experience in wet lab experimentation is considered a major asset.
Eligible candidates:
Faculty proposals
Foundational systems & infrastructure for AI
-
Principal Investigator(s): Karin Strauss (Microsoft Research Redmond), Kate Lytvynets (Microsoft Research Redmond)
Additional Microsoft collaborators: Bichlien Nguyen (Microsoft Research Redmond), Jake Smith (Microsoft Research Redmond), Danrong Zhang (Microsoft Research Redmond),
Summary:
This challenge is focused on developing and/or evaluating the use of AI to address one or more of the many challenges in electricity planning and deployment.
Description:
Clean and affordable electricity is of critical importance for datacenters, their suppliers and the world. Deploying such electricity infrastructure is challenging, with a variety of chokepoints, including transmission bottlenecks, long interconnection and permitting queues, materials and siting challenges, policy challenges, supply constraints, and talent shortages. AI has recently emerged as a powerful tool to help with productivity, so this challenge involves developing new AI or using it to navigate these barriers—accelerating grid modeling, optimizing siting decisions, forecasting renewable availability, and enabling new forms of coordination across the datacenter supply chain and energy ecosystem.
Ideal collaborator:
We are looking for interdisciplinary researchers, spanning areas such as computer science, electrical/civil/environmental engineering, policy, economics, etc.
Eligible candidates:
PhD student, postdoc, and faculty proposals
(Note: If you are a postdoc submitting a proposal for this challenge, please have faculty member submit the proposal, and enter your name and email in the “Student Name” and “Student Email” fields).
-
Principal Investigator(s): Hitesh Ballani (Microsoft Research Cambridge UK), Madan Musuvathi (Microsoft Research Redmond)
Additional Microsoft collaborators: Aashaka Shah (Microsoft Research Redmond), Anand Bonde (Microsoft Research Redmond), Ishai Menache (Microsoft Research Redmond), Konstantina Mellou (Microsoft Research Redmond), Marco Molinaro (Microsoft Azure), Nikhil Swamy (Microsoft Research Redmond), Roshan Dathathri (Microsoft Research Redmond), Saikat Chakraborty (Microsoft Research Redmond), Sarah Fakhoury (Microsoft Research Redmond), Shraddha Barke (Microsoft Research Redmond), Sirui Li (Microsoft Research Redmond)
Summary:
This challenge seeks to build a scalable, open-source ecosystem for reinforcement learning post-training, through foundational systems and algorithms research advances and unlock powerful reasoning capabilities for program intelligence, decision intelligence, and other rigorous, next-generation applications.
Description:
Reinforcement Learning (RL) based approaches have emerged as a critical component for post-training LLMs to enhance their reasoning abilities. Our goal is to build an open-source ecosystem to improve the efficiency and scalability of RL post-training, AI inference, and training by several orders of magnitude. This will allow efficient scaling of the post-training process to bigger models and larger datasets in the push towards achieving strong reasoning capabilities in program intelligence, optimization and decision intelligence, and other rigorous next-generation applications. To that end, we aim to:
1) Exploit workload characteristics to develop system and algorithmic innovations across the stack, such as inference, training, network communication, and memory usage to improve scale by orders of magnitude.
2) Explore existing algorithms and develop new algorithmic techniques to post-train MSR-specific models for cutting-edge applications.
3) Leverage our symbolic reasoning expertise and large repositories of first-party and third-party software to unlock new program intelligence reasoning capabilities and power next-generation software engineering agents.
4) Integrate optimization and generative AI tools to translate natural language descriptions into mathematical optimization problems, increase interpretability, and democratize access to advanced analytics tools.
Ideal collaborator:
Describe the discipline, expertise, and area of focus you are seeking in a collaborator. Please be sure to confirm if you’d like to target PhD students or faculty to collaborate with on your challenge.
(1) Expertise: ML Systems, Systems, High-Performance Computing, GPU Programming and Optimizations, Program Reasoning, Compilers, Operations Research
(2) Strong programming and experimentation skills, especially with LLM frameworks, and comfortable working with and debugging large-scale distributed ML systems
(3) Currently pursuing a PhD in Computer Science or related fields
Eligible candidates:
PhD students and postdocs
Note: Postdocs are invited to submit a proposal under the student track for this challenge. You may bypass the letter of recommendation upload process as this it not required for postdoc proposals. For questions, reach out to msfellow@microsoft.com.
Human-AI collaboration and interaction
-
Principal Investigator(s): Shannon Monroe (Microsoft Research Accelerator), Matt Corwine (Microsoft Research Accelerator),
Additional Microsoft Collaborators: Richard Banks (Microsoft Research Cambridge UK), Sean Rintel (Microsoft Research Cambridge UK), Neeltje Berger (Microsoft Research Accelerator)
Summary:
How can AI models better support creativity and innovation?
Description:
While AI models can help with idea generation and innovation, research shows that they tend to be homogeneous in nature – pushing creatives towards a common set of ideas and concepts. Innovation often comes about through the juxtaposition of two concepts that don’t look like they belong together, but when compared suggest new ideas and directions. Generative AI systems struggle with this context since their purpose is to weight items that belong together, rather than bring together those that don’t. How might we approach the design of new AI models whose goal is to operate right at the boundary of concepts that are connected and those that aren’t, in order to foster more radical forms of innovation?
Ideal collaborator:
We’re seeking a collaborator with expertise in computational creativity, cognitive science, and/or generative AI systems, especially as they relate to human innovation. Ideal candidates will have a strong interdisciplinary orientation, a critical perspective on mainstream model design, and hands-on experience building or training AI models to explore conceptual boundaries and foster originality. They should be comfortable publishing in AI, HCI, or creativity-related venues and eager to challenge assumptions about how machines can support radical forms of innovation.
Eligible candidates:
PhD student and faculty proposals
-
Principal Investigator(s): Siân Lindley (Microsoft Research Cambridge UK), Nathalie Riche (Microsoft Research Redmond), Jack Williams (Microsoft Research Cambridge UK), Nic Marquardt (Microsoft Research Redmond), Hugo Romat (Microsoft Research Redmond)
Summary:
This fellowship explores new abstractions and interaction principles for AI-powered collaborative dynamic experiences that adapt and morph to users’ tasks and context.
Description:
We aim to unlock a future where collaborative workflows are unconstrained by traditional software boundaries, and interfaces adapt and morph to users’ tasks and context. Realizing this vision will require new computing abstractions, principles, and core interaction patterns, which empower users to work with these capabilities rather than be overwhelmed by them.
This fellowship will focus on how dynamic experiences intersect with collaborative work. Research into generative UI (or malleable interfaces) tends to focus on individual experiences. However, modern work is inherently collaborative, and dynamic experiences will need to support collaboration if they are to have relevance. This might include the development of new systems to support AI-powered workflows such as team ideation and vibe coding, and latent collaboration tasks such as shared content creation.
Key challenges in this space include (i) how to support users in collaboratively molding dynamic workspaces tailored to shared goals, (ii) how to support personalized representations for team members, which enable them to successfully collaborate while interacting with a shared underlying piece of work, and (iii) how to represent, integrate, and orchestrate multi-agent and multi-human collaboration across different configurations (human-human, human-agent(s), agent-agents acting on behalf of different humans) and across heterogenous interfaces.
Working with the fellow, we hope to explore new experiences and articulate core principles for collaborative dynamic UI, enabling the balance of dynamism with consistent and shared mental models.
Ideal collaborator:
We seek to devise new interaction principles and abstractions for dynamic interaction with AI and develop proof-of concepts and prototypes to assess user value.
A fellow should share our passion for Human-Computer Interaction (HCI), possess a solid understanding of fundamental design principles, and be familiar with recent advances in Human-AI Interaction such as generative user interfaces and malleable interaction patterns. Strong AI prototyping skills (including web stack development and experience with generative AI models or pipelines) are essential for demonstrating novel techniques and interaction patterns. Experience in conducting user studies to inform or assess user experiences would also be valuable.
Eligible candidates:
PhD student, postdoc, and faculty proposals
-
Principal Investigator(s): Jennifer Neville (Microsoft Research Redmond), Sid Suri (Microsoft Research Redmond), Kori Inkpen (Microsoft Research Redmond)
Additional Microsoft collaborators: Sian Lindley (Micrsoft Research Cambridge UK)
Summary:
This challenge explores how AI can move beyond individual assistance in complex knowledge work to become a true collaborator, teammate, and cognitive partner — through research in adaptive optimization and metrics, coordinated decision-making under uncertainty, preference elicitation and alignment, and information flow for collective intelligence.
Description:
While current AI systems are optimized for individuals performing simple, independent tasks, organizational effectiveness depends on collaboration, coordination, and shared understanding across groups. As such, this challenge focuses on moving AI beyond narrow task assistance to functioning as a longer-horizon collaborator, teammate, and cognitive partner. We frame this problem as a multi-agent, continual-learning setting characterized by shifting goals, partial observability, and non-stationary rewards, where success requires more than the sum of individual contributions. Relevant research includes developing adaptive optimization and metrics that balance divergent stakeholder objectives, designing mechanisms for coordinated decision-making under uncertainty, advancing methods for preference elicitation and alignment, and improving information flows that sustain collective intelligence. We aim to amplify both individual productivity and organizational effectiveness, while deepening connections between AI/ML research and social-organizational theory. Tangible outcomes include new algorithms for adaptive optimization and coordination, benchmarks and evaluation protocols that capture group-level objectives, and prototypes of AI systems capable of enhancing collective reasoning, decision-making, and productivity in real-world knowledge-work settings.
Ideal collaborator:
This research challenge invites applicants from both AI/ML and the social sciences who are interested in advancing AI as a collaborator, teammate, and cognitive partner in complex knowledge work. Ideal collaborators may be AI/ML researchers with expertise in areas such as adaptive optimization, multi-agent systems, or alignment, or computational social scientists (including those in HCI and cognitive science) who study coordination, communication, and collective intelligence. Regardless of background, we are especially seeking researchers who are eager to work across disciplines to connect technical advances with data-driven insights into human and organizational dynamics.
Eligible candidates:
Faculty and postdoc proposals
-
Principal Investigator(s): Jianxun Lian (Microsoft Research Asia), Dongsheng Li (Microsoft Research Asia), Xing Xie (Microsoft Research Asia), Baining Guo (Microsoft Research Asia), Beibei Shi (Microsoft Research Asia)
Summary:
Develop and evaluate socially intelligent AI agents that model human cognition, coordinate with other agents, and collaborate productively with people to solve complex real-world tasks in education, science, and organizational operations.
Description:
We aim to build socially intelligent AI agents that can understand, predict, and respond to human cognition and behavior while coordinating with other agents. Despite recent advances, today’s systems lag behind humans in theory of mind, communication, negotiation, and sustained collaboration, constraining their effectiveness in complex workflows. We invite participants to develop rigorous, interdisciplinary methods — grounded in psychology, cognitive science, HCI, and organizational behavior — for systematically investigating and enhancing social reasoning and multi-agent collaboration. Approaches should incorporate mechanisms for trust, safety, accountability, and aligned goal pursuit in human-AI teams. Target domains include classroom tutoring and group learning, collaborative scientific discovery, and organizational operations such as planning, decision-making, and project execution. Expected outcomes include open-source models and toolkits, curated datasets and benchmarks, empirical studies and research publications, and deployable prototypes integrated into real workflows. The results will also help frontier firms and organizations prepare for the next wave of effective, safe human-AI collaboration at scale.
Ideal collaborator:
We seek collaborators across various research domains (LLMs, multi-agent RL, social sciences, HCI, etc.) who specialize in social reasoning, multi-agent coordination, and human-AI teaming. Ideal partners can design rigorous experiments, run human-subject and field studies, and build deployable prototypes in education, science, or organizational settings. We welcome faculty, PhD students, and postdocs, with preference for interdisciplinary teams and access to user populations or enterprise workflows.
Eligible candidates:
Faculty, PhD student, and postdoc proposals
(Note: If you are a PhD student or postdoc, please have the involved faculty member submit the proposal, and enter your name and email in the “Student Name” and “Student Email” fields).
Multimodal & Embodied Intelligence
-
Principal Investigator(s): Vineeth N Balasubramanian (Microsoft Research India), Tanuja Ganu (Micosoft Research India),
Additional Microsoft collaborators: Mercy Ranjit (opens in new tab)(Microsoft Research India) Neeraj Kayal (Microsoft Research India), Ogbemi Ekwejunor-Etchie (Microsoft Research Accelerator)
Summary:
This challenge explores foundational multimodal LLM architectures that move beyond tokenization, aligning structurally and functionally with modality-specific characteristics — drawing inspiration from human cognition to enable inclusive, efficient, and robust reasoning across modalities like vision, speech, and action.
Description:
Contemporary large language models have made remarkable progress in text understanding but remain limited in how they process and reason across other modalities such as images, video, and speech. Current multimodal approaches typically extend text-based tokenization pipelines to other modalities, which fails to capture their unique structural and relational properties. This challenge aims to explore foundational architectures for multimodal LLMs that move beyond tokenization toward representations that are structurally and functionally aligned with each modality’s characteristics. Inspired by principles of human cognition, where distinct sensory regions integrate information in complementary ways, we seek to design modular and adaptive components that enable more natural cross-modal understanding and reasoning. The research will investigate how these architectures can interleave attention across modalities, improving both perception and inference capabilities in complex, real-world tasks. A particular focus will be on efficiency, achieving parameter- and sample-efficient learning while maintaining strong generalization across modalities. The outcomes will include prototype architectures, evaluation benchmarks, and multimodal reasoning pipelines applicable to domains such as robotics, copilots, and embodied AI systems. Beyond advancing foundational AI research, this research challenge has the potential to unlock inclusive technologies for the global majority, where much of the world’s knowledge exists in non-textual forms such as video and spoken interaction.
Ideal collaborator:
We seek collaboration with faculty and PhD students in AI, machine learning, or computer vision with expertise in multimodal representation learning, efficient model architectures, and large-scale foundation model training. Ideal collaborators will have a strong research track record in areas such as vision-language models, transformer architectures, as well as building and evaluating multimodal systems. Experience in cognitive-inspired AI modeling would be desirable but not mandatory. We aim to jointly explore foundational advances in multimodal LLM architectures that go beyond tokenization, with potential for long-term academic-industry impact.
Eligible candidates:
PhD student, postdoc, and faculty proposals
-
Principal Investigator(s): Jiaolong Yang (Microsoft Research Asia), Li Zhao (Microsoft Research Asia), Jiang Bian (Microsoft Research Asia), Baining Guo (Microsoft Research Asia), Lily Sun (Microsoft Research Accelerator), Jianfeng Gao (Microsoft Research Redmond)
Summary:
Develop and advance foundation models that empower robots to perform a wide variety of tasks with flexibility, reliability, and adaptability across real-world environments.
Description:
Foundation models have shown transformative potential in language and vision domains, and their application in robotics is an exciting frontier for artificial intelligence. This challenge seeks research into designing, training, and evaluating general-purpose robotics foundation models that support perception, reasoning, and action in diverse scenarios. We welcome work on model architectures, scalable training techniques, and approaches that enhance generalization and robustness for robots operating in complex, dynamic environments. Possible directions include leveraging multi-modality inputs and internet-scale data, fast adaptation to varied tasks, safety, and recovery from failure. Successful collaborations might deliver new model designs, open-source foundation models, or robust evaluation benchmarks benefiting the robotics research community. The goal is to facilitate the development of robots that can quickly learn new skills, adapt to new situations, and execute tasks with reliability. Long-term impact spans improved workflows, safer and more capable robots for applications across industry, healthcare, and daily life.
Ideal collaborator:
We are seeking faculty collaborators (professors or principal investigators) with strong backgrounds in robotics, computer vision, multimodal AI, and reinforcement learning. Experience with foundation models such as LLM/VLM/VLA/VideoGen and real-world robotics systems is highly desirable. We welcome proposals from faculty who are eager to bridge AI research and practical robotics applications in partnership with Microsoft Research.
Eligible candidates:
Faculty proposals
-
Principal Investigator(s): Michael Murray (opens in new tab) (Microsoft Research Accelerator), Tess Hellebrekers (opens in new tab) (Microsoft Research Accelerator), Reuben Tan (Microsoft Research Redmond)
Summary:
This challenge explores how robots can leverage alternative data sources including human video, simulation, and synthetic augmentation to scale learning without relying solely on expensive teleoperated demonstrations.
Description:
Current robot learning approaches heavily depend on teleoperated demonstrations, which are expensive to collect and difficult to scale across diverse tasks and environments. This challenge seeks innovative methods to enable robots to learn from alternative data sources that are more readily available or easier to generate at scale. Key data sources of interest include human video demonstrations, physics simulations, procedurally generated synthetic data, and augmented variations of limited real robot data. Successful approaches will need to address fundamental challenges in domain adaptation, including differences in embodiment between humans and robots, the sim-to-real gap, and distribution shift between training and deployment environments. We are particularly interested in methods that can effectively combine multiple data modalities, leverage pretrained vision-language models, and develop robust representations that transfer across domains. The ultimate goal is to achieve robot learning systems that match or exceed the performance of teleoperation-trained policies while requiring orders of magnitude less robot-specific data collection. Solutions should demonstrate generalization to novel objects, tasks, and environments beyond the training distribution. This research has the potential to dramatically accelerate the deployment of capable robots in real-world applications from manufacturing to home assistance.
Ideal collaborator:
We seek researchers with expertise in robot learning, computer vision, domain adaptation, or simulation who are passionate about making robot learning more practical and scalable. Ideal collaborators will have experience with imitation learning, reinforcement learning, or transfer learning approaches, and familiarity with modern deep learning frameworks and robot simulation environments. We particularly value creative problem-solvers who can bridge the gap between different data modalities and have a track record of developing methods that work reliably in real-world robotic systems.
This challenge is only open to proposals from PhD students.
Eligible candidates:
PhD students