NeurIPS 2025
Microsoft is pleased to have over 150 accepted papers at NeurIPS 2025, scheduled for December 2–7, 2025, in San Diego, CA, USA.
Microsoft is pleased to have over 150 accepted papers at NeurIPS 2025, scheduled for December 2–7, 2025, in San Diego, CA, USA.
Reinforcement learning from human feedback (RLHF) has become the standard post-training technique for endowing large language models (LLMs) with helpful, harmless, and intent-consistent behavior. In practice, however, its adoption is hampered by prohibitive memory consumption during the phase of the policy-model update, especially when training…
Recent advancements, such as DeepSeek-Prover-V2-671B and Kimina-Prover-Preview-72B, demonstrate a prevailing trend in leveraging reinforcement learning (RL)-based large-scale training for automated theorem proving. Surprisingly, we discover that even without any training, careful neuro-symbolic coordination of existing off-the-shelf reasoning models and tactic step provers can achieve comparable…
CompanionX moves AI beyond narrow task solving toward human-like collaboration. While modern models excel at mathematics and coding, qualities such as social intelligence, empathy, and benevolence remain underexplored. CompanionX develops agents that understand, respond to, and cooperate with people authentically and proactively, forming personalized minds and communicating with…
As the Women in Machine Learning Workshop (WiML) marks its 20th annual gathering, cofounders, friends, and collaborators Jenn Wortman Vaughan and Hanna Wallach reflect on WiML’s evolution, navigating the field of ML, and their work in responsible AI.
The emergence of large language models (LLMs) offers great promise for building domain-specific agents, but adapting them for network management remains challenging. To understand why, we conduct a case study on network management tasks and find that state-of-the-art specialization techniques rely heavily on extensive, high-quality…
Image Auto-regressive (AR) models have emerged as a powerful paradigm of visual generative models. Despite their promising performance, they suffer from slow generation speed due to the large number of sampling steps required. Although Distilled Decoding 1 (DD1) was recently proposed to enable few-step sampling…
Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across…
Differentially private (DP) synthetic data generation is a promising technique for utilizing private datasets that otherwise cannot be exposed for model training or other analytics. While much research literature has focused on generating private unstructured text and image data, in enterprise settings, structured data (e.g.,…
Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. NOVA integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built…
Recent research on fine-tuning large language models (LLMs) through the aggregation of multiple preferences has attracted considerable attention. However, the existing literature predominantly focuses on the empirical performance of aggregation algorithms while neglecting the underlying motivation for agents to misreport their preferences. In this paper,…
Note from Chief Scientist and editor Jaime Teevan: As you sit down to read the 2025 New Future of Work report, it’s worth pausing to consider the thread that ties the past five years of reports together. The inaugural New Future of Work report, published…