Image of peppers growing on a vine.

Project Gecko

PazaBench: The first ASR leaderboard for low-resource languages

PazaBench is the first automatic speech recognition (ASR) leaderboard dedicated to low‑resource languages. It launches with initial coverage for 39 African languages and benchmarks 52 state‑of‑the‑art ASR and language models, including newly released Paza ASR models for six Kenyan languages. The platform aggregates leading public and community datasets from diverse styles of speech including conversational, scripted read aloud, unscripted, broadcast news, and domain-specific data—into one easy‑to‑explore platform per language. This makes it easier for researchers, developers, and product teams to easily assess which models perform best across underserved languages and diverse regions, understand trade-offs between speed and accuracy while identifying where gaps persist. 

PazaBench tracks three core metrics:

  1. Character Error Rate (CER) which is important for languages with rich word forms, where meaning is built by combining word parts, therefore errors at the character level can significantly impact meaning
  2. Word Error Rate (WER) for word-level transcript accuracy
  3. RTFx (Inverse Real‑Time Factor) which measures how fast transcription runs relative to real‑time audio duration.

Paza ASR Models: Built with and for Kenyan languages

The Paza ASR models consist of three fine-tuned ASR models built on top of state‑of‑the‑art model architectures. Each model targets Swahili, a mid-resource language and five low‑resource Kenyan languages; Dholuo, Kalenjin, Kikuyu, Maasai and Somali. The models are fine-tuned on public and curated proprietary datasets.  

Fine‑tuning the three models allowed us to explore supportive approaches toward a shared goal: building speech recognition systems that are usable for local contexts starting with the six Kenyan languages and bridging the gaps of multi-lingual and multi-modal video question and answering through the MMCT agent. (opens in new tab)

Early versions of two models in Kikuyu and Swahili were deployed on mobile devices and tested directly with farmers in real‑world settings, enabling the team to observe how the models performed with everyday use.his feedback loop directly informed subsequent fine‑tuning, ensuring model improvements were driven not only by benchmark scores, but by the needs and expectations of the communities they are intended to serve.