Microsoft Research Blog

English

Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs

May 28, 2024

Large language models (LLMs) are at the forefront of transforming numerous domains globally. However, their inclusivity and effectiveness remain limited for non-Latin scripts and low-resource languages. This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs without extensive training or fine-tuning. Through…
Lorentz: Learned SKU Recommendation Using Profile Data

May 28, 2024

In response to diverse demands, cloud operators have significantly expanded the array of service offerings, often referred to as Stock Keeping Units (SKUs) available for computing resource configurations. Such diversity has led to increased complexity for customers to choose the appropriate SKU. In the analyzed…
Blind Image Restoration via Fast Diffusion Inversion

May 28, 2024 | Hamadi Chihaoui, Abdelhak Lemkhenter, and Paolo Favaro

Recently, various methods have been proposed to solve Image Restoration (IR) tasks using a pre-trained diffusion model leading to state-of-the-art performance. However, most of these methods assume that the degradation operator in the IR task is completely known. Furthermore, a common characteristic among these approaches…
An Outlook into the Future of Egocentric Vision

May 28, 2024

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To…
TransVIP

May 27, 2024 | Yao Qian

Speech to Speech Translation System with Voice and Isochrony Preservation We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. Furthermore, we propose two separated encoders to preserve the speaker’s voice characteristics and…
Artificial intelligence and radiomics in the diagnosis of intraosseous lesions of the gnathic bones: A systematic review.

May 27, 2024

BACKGROUND The purpose of this systematic review (SR) is to gather evidence on the use of machine learning (ML) models in the diagnosis of intraosseous lesions in gnathic bones and to analyze the reliability, impact, and usefulness of such models. This SR was performed in…
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

May 27, 2024

There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models.…
PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework

May 27, 2024

Large language models (LLMs) have transformed AI across diverse domains, with prompting being central to their success in guiding model outputs. However, manual prompt engineering is both labor-intensive and domain-specific, necessitating the need for automated solutions. We introduce PromptWizard, a novel, fully automated framework for…
PromptFix: You Prompt and We Fix the Photo

May 26, 2024

Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the development of models that effectively recognize and execute user-customized instructions, particularly in low-level tasks. Moreover,…
InversionView: A General-Purpose Method for Reading Information from Neural Activations

May 26, 2024 | Xinting Huang, Madhur Panwar, Navin Goyal, and Michael Hahn

The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations. Computing such subsets…
MunTTS: A Text-to-Speech System for Mundari

May 25, 2024 | Kalika Bali

We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system. Official Codebase for "MunTTS: A…
Crafting Interpretable Embeddings by Asking LLMs Questions

May 25, 2024

Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through…

No results