{"id":1160691,"date":"2026-02-04T21:07:55","date_gmt":"2026-02-05T05:07:55","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1160691"},"modified":"2026-02-05T08:09:45","modified_gmt":"2026-02-05T16:09:45","slug":"paza-introducing-automatic-speech-recognition-benchmarks-and-models-for-low-resource-languages","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/paza-introducing-automatic-speech-recognition-benchmarks-and-models-for-low-resource-languages\/","title":{"rendered":"Paza: Introducing automatic speech recognition benchmarks and models for low resource languages"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1.jpg\" alt=\"Three white line icons on a blue\u2011to\u2011purple gradient background: a vertical audio waveform on the left, a globe showing Africa and Europe in the center, and a network on the right.\" class=\"wp-image-1160744\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<div style=\"padding-bottom:0; padding-top:0\" class=\"wp-block-msr-immersive-section alignfull row wp-block-msr-immersive-section\">\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wp-block-msr-immersive-section__inner wp-block-msr-immersive-section__inner--narrow\">\n\t\t\t<div class=\"wp-block-columns mb-10 pb-1 pr-1 is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\" style=\"box-shadow:var(--wp--preset--shadow--outlined)\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h2 class=\"wp-block-heading h3\" id=\"at-a-glance\">At a glance<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Microsoft Research releases PazaBench and Paza automatic speech recognition models<\/strong>, advancing speech technology for low resource languages.<\/li>\n\n\n\n<li><strong>Human-centered pipeline for low-resource languages: <\/strong>Built for and tested by communities, Paza is an end-to-end, continuous pipeline that elevates historically under-represented languages and makes speech models usable in real-world, low-resource contexts.<\/li>\n\n\n\n<li><strong>First-of-its-kind ASR leaderboard, starting with African languages: <\/strong>Pazabench is the first automatic speech recognition (ASR) leaderboard for low-resource languages. Launching with 39 African languages and 51 state-of-the-art models, it tracks three key metrics across leading public and community datasets.<\/li>\n\n\n\n<li><strong>Human-centered&nbsp;Paza&nbsp;ASR&nbsp;models:<\/strong>&nbsp;Minimal&nbsp;data, fine-tuned&nbsp;ASR models&nbsp;grounded in&nbsp;real-world&nbsp;testing&nbsp;with farmers on everyday mobile devices, covering&nbsp;six&nbsp;Kenyan languages:&nbsp;Swahili,&nbsp;Dholuo, Kalenjin, Kikuyu, Maasai,&nbsp;and Somali.<\/li>\n<\/ul>\n<\/div>\n<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n<p>According to the 2025&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Microsoft-AI-Diffusion-Report-2025-H2.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft AI Diffusion Report<\/a>&nbsp;approximately one&nbsp;in&nbsp;six&nbsp;people globally had used a generative AI product.&nbsp;Yet for billions of&nbsp;people,&nbsp;the promise of voice interaction still falls short, and&nbsp;whilst&nbsp;AI is becoming increasingly multilingual, a key question&nbsp;remains:&nbsp;<em><strong>Do&nbsp;these models&nbsp;actually work&nbsp;for all languages and the people who rely on them?<\/strong><\/em>&nbsp;This challenge is one we first confronted through&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-gecko\/\" target=\"_blank\" rel=\"noreferrer noopener\">Project Gecko<\/a>\u2014a collaboration between Microsoft Research and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/digitalgreen.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Digital&nbsp;Green<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,&nbsp;where&nbsp;field teams across Africa and India focused on building usable AI tools for farmers.<\/p>\n\n\n\n<p>Gecko revealed how often speech systems fail in real\u2011world, low\u2011resource environments\u2014where many languages go&nbsp;unrecognized&nbsp;and non\u2011Western accents are frequently misunderstood. Yet speech remains the primary medium of communication globally. For communities across Kenya, Africa, and beyond, this mismatch creates cascading challenges: without foundational data&nbsp;representing&nbsp;their languages and cultures, innovation stalls, and the digital and AI divides widen.&nbsp;<\/p>\n\n\n\n<p>Paza addresses this with a human-centered speech models pipeline. Through&nbsp;PazaBench, it benchmarks low-resource languages using both public and community-sourced data, and through Paza&nbsp;models, it&nbsp;fine-tunes&nbsp;speech models&nbsp;to deliver outsized gains in mid- and low-resource languages, evaluating with community testers using real devices in real contexts. Upcoming playbooks complement this work by sharing practical guidance on&nbsp;dataset creation,&nbsp;fine-tuning&nbsp;approaches&nbsp;with minimal data&nbsp;and evaluation considerations, introducing a continuous pipeline that&nbsp;enables&nbsp;researchers&nbsp;and practitioners to build and evaluate systems grounded in real human use.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-project-gecko-informed-paza-s-design\">How Project Gecko informed Paza\u2019s design<\/h2>\n\n\n\n<p>In addition to building cost-effective, adaptable AI systems, the extensive&nbsp;fieldwork on&nbsp;Project Gecko highlighted an important lesson:&nbsp;<strong><em>Building usable speech&nbsp;models&nbsp;in low\u2011resource settings is not only a data problem,&nbsp;but also&nbsp;a design and evaluation problem.<\/em><\/strong>&nbsp;For AI systems to be useful, they must work in local languages, support hands\u2011free interaction through voice, text, and video, and deliver information in formats that fit real-world environments, that is, on low-bandwidth&nbsp;mobile devices,&nbsp;in&nbsp;noisy settings, and&nbsp;for&nbsp;varying literacy levels.&nbsp;&nbsp;<\/p>\n\n\n\n<p>These insights shaped the design of Paza, from&nbsp;the&nbsp;Swahili&nbsp;phrase&nbsp;<em><strong>paza&nbsp;sauti<\/strong><\/em>&nbsp;meaning \u201cto project,\u201d or \u201cto raise your voice.\u201d &nbsp;The name reflects our intent: rather than simply adding more languages to existing systems,<strong>&nbsp;Paza is about co-creating speech technologies in partnership with the communities who use them.<\/strong>&nbsp;Guided by this principle, Paza puts human use&nbsp;first,&nbsp;which&nbsp;enables&nbsp;model improvement.&nbsp;<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"670821\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Microsoft research newsletter<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-label=\"Microsoft Research Newsletter\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/09\/Newsletter_Banner_08_2019_v1_1920x1080.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft Research Newsletter<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-newsletter\" class=\"large\">Stay connected to the research community at Microsoft.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button is-style-fill-chevron\">\n\t\t\t\t\t\t<a href=\"https:\/\/info.microsoft.com\/ww-landing-microsoft-research-newsletter.html\" aria-describedby=\"microsoft-research-newsletter\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft Research Newsletter\" target=\"_blank\">\n\t\t\t\t\t\t\tSubscribe today\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"pazabench-the-first-asr-leaderboard-for-low-resource-languages\">PazaBench: The first ASR leaderboard for low-resource languages<\/h2>\n\n\n\n<p><strong>PazaBench<\/strong> is the first automatic speech recognition (ASR) leaderboard dedicated to low\u2011resource languages. It launches&nbsp;with&nbsp;initial&nbsp;coverage&nbsp;for&nbsp;39 African languages and benchmarks&nbsp;52 state\u2011of\u2011the\u2011art ASR and language models, including newly released Paza ASR models for six Kenyan languages. The platform aggregates leading public and community datasets from diverse styles of speech including conversational, scripted read aloud, unscripted, broadcast news, and domain-specific data\u2014into one easy\u2011to\u2011explore platform per language.&nbsp;This makes it easier for&nbsp;researchers, developers, and product teams to easily assess which models perform best across underserved languages and diverse regions, understand trade-offs between speed and accuracy&nbsp;while&nbsp;identifying&nbsp;where gaps persist.&nbsp;<\/p>\n\n\n\n<p><strong>PazaBench tracks three core metrics:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Character Error Rate (CER)<\/strong> which is important for languages with rich word forms, where meaning is built by combining word parts, therefore errors at the character level can significantly impact meaning<\/li>\n\n\n\n<li><strong>Word Error Rate (WER)<\/strong> for word-level transcript accuracy<\/li>\n\n\n\n<li><strong>RTFx (Inverse Real\u2011Time Factor)<\/strong> which measures how fast transcription runs relative to real\u2011time audio duration<em>.<\/em><\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"PazaBench Walkthrough\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube-nocookie.com\/embed\/jAuuh0saMUI?feature=oembed&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong><em>More than scores,&nbsp;PazaBench&nbsp;standardizes evaluation to prioritize dataset gaps,&nbsp;identify&nbsp;underperforming languages, and highlight where localized models beat&nbsp;wider coverage ASR models\u2014offering early evidence of&nbsp;the value of African\u2011centric innovation.<\/em><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/huggingface.co\/spaces\/microsoft\/paza-bench\" target=\"_blank\" rel=\"noreferrer noopener\">Explore PazaBench<\/a><\/div>\n<\/div>\n\n\n\n<p class=\"has-text-align-center\"><em><sup>To contribute to the benchmark, request additional language evaluation on the leaderboard.<\/sup><\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"paza-asr-models-built-with-and-for-kenyan-languages\">Paza ASR Models: Built with and for Kenyan languages<\/h2>\n\n\n\n<p>The Paza ASR models\u00a0consist of\u00a0three fine-tuned ASR models built on top of state\u2011of\u2011the\u2011art model architectures. Each model targets\u00a0<em>Swahili,\u00a0<\/em>a mid-resource language and five low\u2011resource Kenyan languages;\u00a0<em>Dholuo, Kalenjin, Kikuyu, Maasai\u00a0and Somali<\/em>.\u00a0The models are\u00a0fine-tuned\u00a0on\u00a0public and curated proprietary datasets.\u00a0\u00a0<\/p>\n\n\n\n<p>Fine\u2011tuning the three models allowed us to explore supportive approaches toward a shared goal: building speech recognition systems that are usable for local contexts starting with the six Kenyan languages and bridging the gaps of multi-lingual and multi-modal video question and answering through the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/labs.ai.azure.com\/projects\/mmct-agent\/\" target=\"_blank\" rel=\"noopener noreferrer\">MMCT agent.<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Project Gecko: Building globally equitable generative AI\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube-nocookie.com\/embed\/59O8kP8pmtI?feature=oembed&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">See the MMCT agent in action in the field<\/figcaption><\/figure>\n\n\n\n<p>Early versions of two models in Kikuyu and Swahili were deployed on mobile devices and tested directly with farmers in real\u2011world settings, enabling the team to observe how the models performed with everyday use. Farmers provided in\u2011the\u2011moment feedback on accuracy, usability, and relevance, highlighting where transcripts broke down, which errors were most disruptive, and what improvements would make the models more helpful in practice. This feedback loop directly informed subsequent fine\u2011tuning, ensuring model improvements were driven not only by benchmark scores, but by the needs and expectations of the communities they are intended to serve.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--2\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/huggingface.co\/collections\/microsoft\/paza\" target=\"_blank\" rel=\"noreferrer noopener\">Explore Paza Collection Here<\/a><\/div>\n<\/div>\n\n\n\n<p>Here is how Paza models compare to&nbsp;three&nbsp;state-of-the-art&nbsp;ASR&nbsp;models&nbsp;today:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1287\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-scaled.png\" alt=\"Figure 1: Character Error Rate (CER) comparison across the Kenyan languages for several state\u2011of\u2011the\u2011art ASR models including the Paza models. Lower CER indicates better transcription performance.\" class=\"wp-image-1161323\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-300x151.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-1024x515.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-768x386.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-1536x772.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-2048x1029.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG1_overall_cer_grouped_sorted_NEW-240x121.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 1: Character Error Rate (CER) comparison across the Kenyan languages for several state\u2011of\u2011the\u2011art ASR models including the Paza models. Lower CER indicates better transcription performance.<\/em><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1287\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-scaled.png\" alt=\"Figure 2: Word Error Rate (WER) comparison across the Kenyan languages for several state\u2011of\u2011the\u2011art ASR models including the Paza models. Lower WER indicates better transcription performance.\" class=\"wp-image-1161325\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-300x151.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-1024x515.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-768x386.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-1536x772.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-2048x1029.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG2_overall_wer_grouped_sorted_NEW-240x121.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 2: Word Error Rate (WER) comparison across the Kenyan languages for several state\u2011of\u2011the\u2011art ASR models including the Paza models. Lower WER indicates better transcription performance.<\/em><\/figcaption><\/figure>\n\n\n\n<p><strong>1) Paza\u2011Phi\u20114\u2011Multimodal\u2011Instruct<\/strong><\/p>\n\n\n\n<p>Microsoft\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/Phi-4-multimodal-instruct\" target=\"_blank\" rel=\"noopener noreferrer\">Phi\u20114 multimodal\u2011instruct<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is a next\u2011generation small language model built to reason across audio, text, and vision. With Paza, we extend its audio capabilities, adapting a powerful multimodal architecture into a high\u2011quality automatic speech recognition (ASR) system for low\u2011resource African languages.<\/p>\n\n\n\n<p>Fine\u2011tuned on unified multilingual speech datasets, the model was optimized specifically for transcription in the six languages. The model preserves its underlying transformer architecture and multi-modal capabilities, while selectively fine-tuning only the audio\u2011specific components, enabling strong cross\u2011lingual generalization.<\/p>\n\n\n\n<p>As the results below show, this model delivers consistent improvements in transcription quality across all six languages.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1063\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-scaled.png\" alt=\"Figure 3: Character Error Rate (CER) comparison across\u00a0the six\u00a0languages for the base\u00a0model\u00a0versus the finetuned Paza model.\u00a0Lower CER\u00a0indicates\u00a0better transcription performance.\" class=\"wp-image-1161376\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-300x125.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-1024x425.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-768x319.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-1536x638.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-2048x851.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG3_phi_cer_comparison_NEW-240x100.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 3: <em>Character Error Rate (CER) comparison across&nbsp;the six<\/em><strong>&nbsp;<\/strong><em>languages for the base&nbsp;model&nbsp;versus the finetuned Paza model.&nbsp;Lower CER&nbsp;indicates&nbsp;better transcription performance.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1063\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-scaled.png\" alt=\"Figure 4: Word Error Rate (WER) comparison across the six languages for the base model versus the finetuned Paza model. Lower WER indicates better transcription performance.\" class=\"wp-image-1161378\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-300x125.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-1024x425.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-768x319.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-1536x638.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-2048x851.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG4_phi_wer_comparison_NEW-240x100.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 4: Word Error Rate (WER) comparison across the six languages for the base model versus the finetuned Paza model. Lower WER indicates better transcription performance.<\/em><\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--3\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/huggingface.co\/microsoft\/paza-Phi-4-multimodal-instruct\" target=\"_blank\" rel=\"noreferrer noopener\">Test the model here<\/a><\/div>\n<\/div>\n\n\n\n<p><strong>2) Paza\u2011MMS\u20111B\u2011All<\/strong><\/p>\n\n\n\n<p>This model is fine-tuned on Meta\u2019s mms-1b-all model, which employs a large-scale Wav2Vec2.0-style encoder with lightweight language-specific adapters to enable efficient multilingual specialization. For this release, each of the six language adapters was fine\u2011tuned independently on curated low\u2011resource datasets, allowing targeted adaptation while keeping the shared encoder largely frozen.<\/p>\n\n\n\n<p>As shown in the figures below, this model improves transcription accuracy while maintaining the model\u2019s strong cross\u2011lingual generalization.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1160\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-scaled.png\" alt=\"Figure 5: Character Error Rate (CER)\u00a0comparison across the six\u00a0languages for the base model\u00a0versus\u00a0the finetuned Paza model.\u00a0Lower CER\u00a0indicates\u00a0better transcription performance.\" class=\"wp-image-1161380\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-300x136.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-1024x464.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-768x348.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-1536x696.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-2048x928.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG5_mms_cer_comparison_NEW-240x109.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 5: <em>Character Error Rate (CER)&nbsp;comparison across the six<\/em><strong>&nbsp;<\/strong><em>languages for the base model&nbsp;versus&nbsp;the finetuned Paza model.&nbsp;Lower CER&nbsp;indicates&nbsp;better transcription performance.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1160\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-scaled.png\" alt=\"Figure 6: Word Error Rate (WER)\u00a0comparison across the six\u00a0languages for the base model\u00a0versus the finetuned Paza model.\u00a0Lower WER indicates better transcription performance.\" class=\"wp-image-1161382\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-300x136.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-1024x464.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-768x348.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-1536x696.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-2048x928.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG6_mms_wer_comparison_NEW-240x109.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 6: <em>Word Error Rate (WER)&nbsp;comparison across the six<\/em><strong>&nbsp;<\/strong><em>languages for the base model&nbsp;versus the finetuned Paza model.&nbsp;Lower WER indicates better transcription performance.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--4\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"http:\/\/Aka.ms\/pazareap\" target=\"_blank\" rel=\"noreferrer noopener\">Join the Research Early Access Program<\/a><\/div>\n<\/div>\n\n\n\n<p><strong>3) Paza\u2011Whisper\u2011Large\u2011v3\u2011Turbo<\/strong><\/p>\n\n\n\n<p>This model is finetuned on OpenAI\u2019s whisper-large-v3-turbo&nbsp;base model. Whisper is a transformer-based encoder\u2013decoder model which&nbsp;delivers robust automatic speech recognition (ASR)&nbsp;capabilities. This model was fine\u2011tuned on the entire unified multilingual ASR dataset,&nbsp;on&nbsp;the mentioned six languages, to encourage cross-lingual generalization.&nbsp;In addition, an extra post\u2011processing step was applied to address the known Whisper hallucination failure modes, improving transcription reliability.<\/p>\n\n\n\n<p>As shown below, this release achieves improved transcription accuracy while retaining Whisper\u2019s robustness.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1081\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-scaled.png\" alt=\"Figure 7: Character Error Rate (CER) comparison across the six\u00a0languages for the base model versus the finetuned Paza model.\u00a0Lower CER\u00a0indicates\u00a0better transcription performance.\" class=\"wp-image-1161338\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-300x127.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-1024x432.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-768x324.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-1536x648.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-2048x864.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG7whisper_cer_comparison_NEW-240x101.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 7: <em>Character Error Rate (CER) comparison across the six<\/em><strong>&nbsp;<\/strong><em>languages for the base model versus the finetuned Paza model.&nbsp;Lower CER&nbsp;indicates&nbsp;better transcription performance.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1081\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-scaled.png\" alt=\"Figure 8: Word Error Rate (WER) comparison across the six\u00a0languages for the base model versus the finetuned Paza model.\u00a0Lower WER indicates better transcription performance.\" class=\"wp-image-1161341\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-scaled.png 2560w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-300x127.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-1024x432.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-768x324.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-1536x648.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-2048x864.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/FIG8_whisper_wer_comparison_NEW-240x101.png 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 8: <em>Word Error Rate (WER) comparison across the six<\/em><strong>&nbsp;<\/strong><em>languages for the base model versus the finetuned Paza model.&nbsp;Lower WER indicates better transcription performance.<\/em><\/em><\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--5\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/huggingface.co\/microsoft\/paza-whisper-large-v3-turbo\" target=\"_blank\" rel=\"noreferrer noopener\">Test the model here<\/a><\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"where-do-we-go-from-here\">Where do we go from here<\/h2>\n\n\n\n<p>AI is reshaping how the world communicates. Designing with people, not just for them, means looking beyond the languages that are already well\u2011served. We plan to expand PazaBench beyond African languages and evaluate state\u2011of\u2011the\u2011art ASR models across&nbsp;more low\u2011resource languages globally. The Paza ASR models are an early step; truly supporting small and under\u2011represented languages requires dedicated datasets, strong local partnerships, and rigorous evaluation. Meaningful progress depends on sustained collaboration with the communities who speak these languages, and expanding responsibly means prioritizing depth and quality over broad but shallow coverage.&nbsp;<\/p>\n\n\n\n<p>As we continue this work,&nbsp;we&#8217;re&nbsp;distilling our methods into a forthcoming playbook to help the broader ecosystem curate datasets, fine\u2011tune responsibly, and evaluate models in real\u2011world conditions. And we\u2019re not stopping at speech\u2014additional&nbsp;playbooks will guide&nbsp;teams&nbsp;building AI tools and applications for multilingual, multicultural contexts, and give them practical recommendations for deploying across diverse communities.&nbsp;<\/p>\n\n\n\n<p>Together, these guides\u2014grounded in technical advances and community\u2011driven design\u2014share our learnings to help researchers, engineers, and designers build more human\u2011centered AI systems.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"acknowledgements\">Acknowledgements<\/h2>\n\n\n\n<p>The following researchers played an integral role in this work: Najeeb Abdulhamid, Felermino Ali, Liz Ankrah, Kevin Chege, Ogbemi Ekwejunor-Etchie, Ignatius Ezeani, Tanuja Ganu, Antonis Krasakis, Mercy Kwambai, Samuel Maina, Muchai Mercy, Danlami Mohammed, Nick Mumero, Martin Mwiti, Stephanie Nyairo, Millicent Ochieng and Jacki O\u2019Neill.<\/p>\n\n\n\n<p>We would like to thank the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/digitalgreen.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Digital Green<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> team\u2014Rikin Gandhi, Alex Mwaura, Jacqueline Wang\u2019ombe, Kevin Mugambi, Lorraine Nyambura, Juan Pablo, Nereah Okanga, Ramaskanda R.S, Vineet Singh, Nafhtari Wanjiku, Kista Ogot, Samuel Owinya&nbsp;and the community evaluators in Nyeri and Nandi, Kenya \u2014 for their valuable contributions to this work.<\/p>\n\n\n\n<p>We extend our gratitude to the creators, community contributors, and maintainers of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/datasets\/MCAA1-MSU\/anv_data_ke\" target=\"_blank\" rel=\"noopener noreferrer\">African Next Voices Kenya<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/datasets\/dsfsi-anv\/za-african-next-voices\" target=\"_blank\" rel=\"noopener noreferrer\">African Next Voices South Africa<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/openslr.org\/25\/\" target=\"_blank\" rel=\"noopener noreferrer\">ALFFA<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/datasets\/DigiGreen\/KikuyuASR_trainingdataset\" target=\"_blank\" rel=\"noopener noreferrer\">Digigreen<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/datasets\/google\/fleurs\" target=\"_blank\" rel=\"noopener noreferrer\">Google FLEURS<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/commonvoice.mozilla.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Mozilla Common Voice<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/datasets\/naijavoices\/naijavoices-dataset\" target=\"_blank\" rel=\"noopener noreferrer\">Naija Voices<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> whose efforts have been invaluable in advancing African languages speech data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft Research unveils Paza, a human-centered speech pipeline, and PazaBench, the first leaderboard for low-resource languages. It covers 39 African languages and 52 models and is tested with communities in real settings. <\/p>\n","protected":false},"author":43518,"featured_media":1160744,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1160691","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-human-language-technologies","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[1021599],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[1119384],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Mercy Muchai","user_id":40846,"display_name":"Mercy Muchai","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mercymuchai\/\" aria-label=\"Visit the profile page for Mercy Muchai\">Mercy Muchai<\/a>","is_active":false,"last_first":"Muchai, Mercy","people_section":0,"alias":"mercymuchai"},{"type":"guest","value":"kevin-chege","user_id":"1160687","display_name":"Kevin  Chege","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/kevinchege\/\" aria-label=\"Visit the profile page for Kevin  Chege\">Kevin  Chege<\/a>","is_active":true,"last_first":"Chege, Kevin ","people_section":0,"alias":"kevin-chege"},{"type":"guest","value":"nick-mumero","user_id":"1160689","display_name":"Nick  Mumero","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/nick-mumero\/\" aria-label=\"Visit the profile page for Nick  Mumero\">Nick  Mumero<\/a>","is_active":true,"last_first":"Mumero, Nick ","people_section":0,"alias":"nick-mumero"},{"type":"user_nicename","value":"Stephanie Nyairo","user_id":40282,"display_name":"Stephanie Nyairo","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/snyairo\/\" aria-label=\"Visit the profile page for Stephanie Nyairo\">Stephanie Nyairo<\/a>","is_active":false,"last_first":"Nyairo, Stephanie","people_section":0,"alias":"snyairo"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"Three white line icons on a blue\u2011to\u2011purple gradient background: a vertical audio waveform on the left, a globe showing Africa and Europe in the center, and a network on the right.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/Paza-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mercymuchai\/\" title=\"Go to researcher profile for Mercy Muchai\" aria-label=\"Go to researcher profile for Mercy Muchai\" data-bi-type=\"byline author\" data-bi-cN=\"Mercy Muchai\">Mercy Muchai<\/a>, <a href=\"https:\/\/www.linkedin.com\/in\/kevinchege\/\" title=\"Go to researcher profile for Kevin  Chege\" aria-label=\"Go to researcher profile for Kevin  Chege\" data-bi-type=\"byline author\" data-bi-cN=\"Kevin  Chege\">Kevin  Chege<\/a>, <a href=\"https:\/\/www.linkedin.com\/in\/nick-mumero\/\" title=\"Go to researcher profile for Nick  Mumero\" aria-label=\"Go to researcher profile for Nick  Mumero\" data-bi-type=\"byline author\" data-bi-cN=\"Nick  Mumero\">Nick  Mumero<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/snyairo\/\" title=\"Go to researcher profile for Stephanie Nyairo\" aria-label=\"Go to researcher profile for Stephanie Nyairo\" data-bi-type=\"byline author\" data-bi-cN=\"Stephanie Nyairo\">Stephanie Nyairo<\/a>","formattedDate":"February 4, 2026","formattedExcerpt":"Microsoft Research unveils Paza, a human-centered speech pipeline, and PazaBench, the first leaderboard for low-resource languages. It covers 39 African languages and 52 models and is tested with communities in real settings.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160691","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43518"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1160691"}],"version-history":[{"count":105,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160691\/revisions"}],"predecessor-version":[{"id":1161547,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160691\/revisions\/1161547"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1160744"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1160691"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1160691"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1160691"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1160691"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1160691"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1160691"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1160691"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1160691"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1160691"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1160691"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1160691"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}