Microsoft Research Blog

English

Knowledge boosting during low-latency inference

September 1, 2024

Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running…
Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

September 1, 2024

Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we…
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

September 1, 2024

Given the popularity of generative AI, Large Language Models (LLMs) often consume hundreds or thousands of GPUs for parallelizing and accelerating the training process. Communication overhead becomes more pronounced when training LLMs at scale. To eliminate communication overhead in distributed LLM training, we propose Domino,…
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

September 1, 2024

Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue.…
Uncover Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor

September 1, 2024 | Ying Cao, Fan Yang, and Mao Yang

Abstract to come...
Intention Is All You Need

September 1, 2024 | Advait Sarkar

Among the many narratives of the transformative power of Generative AI is one that sees in the world a latent nation of programmers who need to wield nothing but intentions and natural language to render their ideas in software. In this paper, this outlook is…
Feeling-the-Beat: Enhancing Empathy and Engagement during Public Speaking through Heart Rate Sharing

September 1, 2024

Public speaking experts intentionally take their audience on an emotional roller coaster, staying attuned to their audience's collective emotional feedback. In this research, we explore how bidirectional sharing of heart rates between a speaker and their audience facilitates this emotional exchange, through empathy, emotional awareness,…
Performance of explainable artificial intelligence in guiding the management of patients with a pancreatic cyst

September 1, 2024

Background/objectives Pancreatic cyst management can be distilled into three separate pathways – discharge, monitoring or surgery– based on the risk of malignant transformation. This study compares the performance of artificial intelligence (AI) models to clinical care for this task. Methods Two explainable boosting machine (EBM)…
The Market Effects of Algorithms

September 1, 2024 | Lindsey Raymond

While there is excitement about the potential of algorithms to optimize individual decision-making, changing individual behavior will, almost inevitably, impact markets. Yet little is known about these effects. In this paper, I study how the availability of algorithmic prediction changes entry, allocation, and prices in…
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure

September 1, 2024

Large language model (LLM)-based AI delegates are increasingly utilized to act on behalf of users, assisting them with a wide range of tasks through conversational interfaces. Despite their advantages, concerns arise regarding the potential risk of privacy leaks, particularly in scenarios involving social interactions. While…
Rethinking Node Representation Interpretation through Relation Coherence

September 1, 2024

Understanding node representations in graph-based models is crucial for uncovering biases, diagnosing errors, and building trust in model decisions. However, previous work on explainable AI for node representations has primarily emphasized explanations (reasons for model predictions) rather than interpretations (mapping representations to understandable concepts). Furthermore,…
PAM: Prompting Audio-Language Models for Audio Quality Assessment

September 1, 2024

While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an…

No results