Microsoft Research Blog

English

  1. Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely 

    September 1, 2024

    Large language models (LLMs) augmented with external data have demonstrated remarkable capabilities in completing real-world tasks. Techniques for integrating external data into LLMs, such as Retrieval-Augmented Generation (RAG) and fine-tuning, are gaining increasing attention and widespread application. Nonetheless, the effective deployment of data-augmented LLMs across…

  2. Datacenter power and energy management: past, present, and future 

    September 1, 2024 | Ricardo Bianchini, Christian Belady, and Anand Sivasubramaniam

    This article overviews some of the key past developments in cloud datacenter power and energy management, where we are today, and what the future could be. This topic is gaining enormous, renewed interest in the context of the conflicting needs of the AI revolution and…

  3. COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning 

    September 1, 2024

    We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech Comprehension Test Question-Answer (SQA) pairs from speech transcriptions for supervised instruction tuning. With under…

  4. AI detection of malicious push notifications in augmented reality in the workplace 

    September 1, 2024 | Sarah Katz

    Distraction caused by the visual processing of multiple objects during augmented reality (AR) immersion could make users more susceptible to malicious push notifications, thus potentially exposing organisations to unwitting insider threats. This case study consulted four experts in the field of AR application development to…

  5. Target conversation extraction: Source separation using turn-taking dynamics 

    September 1, 2024

    Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker…

  6. Knowledge boosting during low-latency inference 

    September 1, 2024

    Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running…