Microsoft Research Blog

Artificial intelligence

ChatBench: From Static Benchmarks to Human-AI Evaluation

April 1, 2025 | Serina Chang, Ashton Anderson, and Jake Hofman

With the rapid adoption of LLM-based chatbots, there is a pressing need to evaluate what humans and LLMs can achieve together. However, standard benchmarks, such as MMLU, measure LLM capabilities in isolation (i.e., "AI-alone"). Here, we design and conduct a user study to convert MMLU…
UFO: A UI-Focused Agent for Windows OS Interaction

April 1, 2025

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables…
TeCoFeS: Text Column Featurization using Semantic Analysis

April 1, 2025

Extracting insights from text columns can be challenging and time-intensive. Existing methods for topic modeling and feature extraction are based on syntactic features and often overlook the semantics. We introduce the semantic text column featurization problem, and present a scalable approach for automatically solving it.…
Are We On Track? AI-Assisted Active and Passive Goal Reflection During Meetings

April 1, 2025

Meetings often suffer from a lack of intentionality, such as unclear goals and straying off-topic. Identifying goals and maintaining their clarity throughout a meeting is challenging, as discussions and uncertainties evolve. Yet meeting technologies predominantly fail to support meeting intentionality. AI-assisted reflection is a promising…
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models

April 1, 2025 | Peeyush Kumar and Kartik Sharma

This paper presents OG-RAG, an Ontology-Grounded Retrieval Augmented Generation method designed to enhance LLM-generated responses by anchoring retrieval processes in domain-specific ontologies. While LLMs are widely used for tasks like question answering and search, they struggle to adapt to specialized knowledge, such as industrial workflows…
Execution-guided within-prompt search for programming-by-example

April 1, 2025

Large language models (LLMs) can generate code from examples without being limited to a DSL, but they lack search, as sampled programs are independent. In this paper, we use an LLM as a policy that generates lines of code and then join these lines of…
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

March 31, 2025

Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this…
Evidence Aggregator: AI reasoning applied to rare disease diagnostics

March 13, 2025

Retrieving, reviewing, and synthesizing technical information can be time-consuming and challenging, particularly when requiring specialized expertise, as is the case of variant assessment for rare disease diagnostics. To address this challenge, we developed the Evidence Aggregator (EvAgg), a generative AI tool designed for rare disease…
Fostering appropriate reliance on GenAI: Lessons learned from early research

March 3, 2025

In this report, we summarize lessons learned from our work on fostering appropriate reliance on AI. We derived three UX goals for fostering appropriate reliance on AI from the barriers to appropriate reliance observed in multiple studies. These three UX goals inform the Overreliance Risk…
Societal AI: Research Challenges and Opportunities

March 1, 2025 | Beibei Shi, Haotian Li, Xing Xie, and Societal AI Team

Artificial intelligence is reshaping society at an unprecedented scale, influencing key domains such as education, labor, governance, and scientific discovery. As AI models, particularly large language models, become more capable and autonomous, their societal impact raises urgent questions regarding fairness, interpretability, alignment with human values,…
What Makes a Good Diffusion Planner for Decision Making?

March 1, 2025 | Haofei Lu, Dongqi Han, Yifei Shen, and Dongsheng Li

Diffusion models have recently shown significant potential in solving decision-making problems, particularly in generating behavior plans -- also known as diffusion planning. While numerous studies have demonstrated the impressive performance of diffusion planning, the mechanisms behind the key components of a good diffusion planner remain…
The future of the industrial AI edge is cellular

February 26, 2025 | Xenofon Foukas and Bozidar Radunovic

Ensuring reliable and high-bandwidth wireless connectivity and local processing at the edge are crucial enablers for emerging industrial AI applications. In this work, we argue that the recent trends in cellular networking make the technology the ideal connectivity solution for these applications, due to its…

No results