Microsoft Research Blog

English

  1. Knowledge boosting during low-latency inference 

    September 1, 2024

    Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running…

  2. Total-Duration-Aware Duration Modeling for Text-to-Speech Systems 

    September 1, 2024

    Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we…

  3. An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS 

    September 1, 2024

    Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue.…

  4. Intention Is All You Need 

    September 1, 2024 | Advait Sarkar

    Among the many narratives of the transformative power of Generative AI is one that sees in the world a latent nation of programmers who need to wield nothing but intentions and natural language to render their ideas in software. In this paper, this outlook is…

  5. The Market Effects of Algorithms 

    September 1, 2024 | Lindsey Raymond

    While there is excitement about the potential of algorithms to optimize individual decision-making, changing individual behavior will, almost inevitably, impact markets. Yet little is known about these effects. In this paper, I study how the availability of algorithmic prediction changes entry, allocation, and prices in…

  6. AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure 

    September 1, 2024

    Large language model (LLM)-based AI delegates are increasingly utilized to act on behalf of users, assisting them with a wide range of tasks through conversational interfaces. Despite their advantages, concerns arise regarding the potential risk of privacy leaks, particularly in scenarios involving social interactions. While…

  7. Rethinking Node Representation Interpretation through Relation Coherence 

    September 1, 2024

    Understanding node representations in graph-based models is crucial for uncovering biases, diagnosing errors, and building trust in model decisions. However, previous work on explainable AI for node representations has primarily emphasized explanations (reasons for model predictions) rather than interpretations (mapping representations to understandable concepts). Furthermore,…

  8. PAM: Prompting Audio-Language Models for Audio Quality Assessment 

    September 1, 2024

    While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an…