Microsoft Research Blog

English

  1. Lorentz: Learned SKU Recommendation Using Profile Data 

    May 28, 2024

    In response to diverse demands, cloud operators have significantly expanded the array of service offerings, often referred to as Stock Keeping Units (SKUs) available for computing resource configurations. Such diversity has led to increased complexity for customers to choose the appropriate SKU. In the analyzed…

  2. Blind Image Restoration via Fast Diffusion Inversion 

    May 28, 2024 | Hamadi Chihaoui, Abdelhak Lemkhenter, and Paolo Favaro

    Recently, various methods have been proposed to solve Image Restoration (IR) tasks using a pre-trained diffusion model leading to state-of-the-art performance. However, most of these methods assume that the degradation operator in the IR task is completely known. Furthermore, a common characteristic among these approaches…

  3. An Outlook into the Future of Egocentric Vision 

    May 28, 2024

    What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To…

  4. TransVIP 

    May 27, 2024 | Yao Qian

    Speech to Speech Translation System with Voice and Isochrony Preservation We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. Furthermore, we propose two separated encoders to preserve the speaker’s voice characteristics and…

  5. TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation 

    May 27, 2024

    There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models.…

  6. PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework 

    May 27, 2024

    Large language models (LLMs) have transformed AI across diverse domains, with prompting being central to their success in guiding model outputs. However, manual prompt engineering is both labor-intensive and domain-specific, necessitating the need for automated solutions. We introduce PromptWizard, a novel, fully automated framework for…

  7. PromptFix: You Prompt and We Fix the Photo 

    May 26, 2024

    Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the development of models that effectively recognize and execute user-customized instructions, particularly in low-level tasks. Moreover,…

  8. MunTTS: A Text-to-Speech System for Mundari 

    May 25, 2024 | Kalika Bali

    We presentĀ MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system. Official Codebase for "MunTTS: A…

  9. Crafting Interpretable Embeddings by Asking LLMs Questions 

    May 25, 2024

    Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through…