Microsoft Research Blog

English

  1. VoluMe results

    VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction 

    October 1, 2025

    Virtual 3D meetings offer the potential to enhance copresence, increase engagement and thus improve effectiveness of remote meetings compared to standard 2D video calls. However, representing people in 3D meetings remains a challenge; existing solutions achieve high quality by using complex hardware, making use of…

  2. Omnidirectional 3D Scene Reconstruction from Single Image 

    October 1, 2025 | Ren Yang, Jiahao Li, and Yan Lu

    Reconstruction of 3D scenes from a single image is a crucial step towards enabling next-generation AI-powered immersive experiences. However, existing diffusion-based methods often struggle with reconstructing omnidirectional scenes due to geometric distortions and inconsistencies across the generated novel views, hindering accurate 3D recovery. To overcome…

  3. MOF-BFN: Metal-Organic Frameworks Structure Prediction via Bayesian Flow Networks 

    October 1, 2025

    Metal-Organic Frameworks (MOFs) have attracted considerable attention due to their unique properties including high surface area and tunable porosity, and promising applications in catalysis, gas storage, and drug delivery. Structure prediction for MOFs is a challenging task, as these frameworks are intrinsically periodic and hierarchically…

  4. Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention 

    October 1, 2025 | Gene-Ping Yang and Sebastian Braun

    The increasing number of microphone-equipped personal devices offers great flexibility and potential using them as ad-hoc microphone arrays in dynamic meeting environments. However, most existing approaches are designed for time-synchronized microphone setups, a condition that may not hold in real-world meeting scenarios, where time latency…

  5. Accelerating Block Coordinate Descent for LLM Finetuning via Landscape Expansion 

    October 1, 2025

    Finetuning large language models (LLMs) is a resource-intensive task for researchers in academia, with memory constraints posing a key bottleneck. A classic optimization method, block coordinate descent (BCD), significantly reduces memory cost by segmenting the trainable parameters into multiple blocks and optimizing one active block…

  6. Do LLMs Comply Differently During Tests? And Can We Steer That? 

    October 1, 2025 | Sahar Abdelnabi and Ahmed Salem

    Reasoning‐focused large language models (LLMs) sometimes alter their behavior when they detect that they are being evaluated—an effect analogous to the Hawthorne phenomenon—which can lead them to optimize for test‐passing performance or to comply more readily with harmful prompts if real‐world consequences appear absent. We…