Publication
Kosmos-2.5: A Multimodal Literate Model
Microsoft Research Blog
Research Focus: Week of August 14, 2023
In this issue: HyWay enables hybrid mingling; Auto-Tables transforms non-relational tables into standard relational forms; training dense retrievers to identify high-quality in-context examples for LLM; improving pronunciation assessment in CAPT.
Project
SpeechX
Neural Codec Language Model as a Versatile Speech Transformer SpeechX is a versatile speech generation model leveraging audio and text prompts, which can deal with both clean and noisy speech inputs and perform zero-shot TTS…