Advancing AI to meet needs of the global majority

Generative AI powers apps and tools that boost productivity and knowledge in much of the world.

But these systems don’t work equally well for all communities—especially those under-represented online, where most AI training data originates. As a result, generative AI performs poorly in many languages and does not reflect the social and cultural realities of every population. Infrastructure challenges are partly to blame, but in nations where low-resource languages dominate, adoption of AI is lower, even after adjusting for GDP and internet access.

That’s where Project Gecko comes in. This Microsoft Research-led initiative is designed to close these equity gaps by creating cost-effective, tailorable AI systems that deliver vital expertise to the global majority. It uses local languages, culturally sensitive content, and multimodal engagement through text, voice, and video. It brings together researchers from Microsoft Research Africa, Nairobi, Microsoft Research India, and the Microsoft Research Accelerator in the United States, along with Digital Green (opens in new tab)—a global development organization that builds community-driven digital infrastructure for agriculture—and several contributors in agri-tech, philanthropy, and academia.

A critical advance is a new AI system called MMCTAgent, which analyzes inputs from speech, images, and videos and provides relevant, context-aware responses. MMCTAgent is now available on Azure AI Foundry Labs (opens in new tab), and the code may be downloaded from GitHub (opens in new tab).

This work reflects Microsoft’s mission to empower every person and every organization on the planet to achieve more. Developing globally equitable generative AI that reflects the culturally nuanced lived experiences of the communities it serves helps to advance AI in a responsible, inclusive way.

The following researchers played an integral role in this research: Najeeb Abdulhamid, Liz Ankrah, Kalika Bali, Kevin Chege, Arnab Paul Choudhury, Kavyansh Chourasia, Soumya De, Ogbemi Ekwejunor-Etchie, Ignatius Ezeani, Ade Famoti, Tanuja Ganu, Prashant Kodali, Antonis Krasakis, Mercy Kwambai, Samuel Maina, Muchai Mercy, Danlami Mohammed, Nick Mwangi, Martin Mwiti, Akshay Nambi, Stephanie Nyario, Millicent Ochieng, Jacki O’Neill, Aman Patkar, and Sunayana Sitaram.

“Building AI systems from the ground up, shaped by the knowledge, languages, and modalities of the global majority, yields more innovative, useful solutions for a great number of people. This is a crucial step in our progress toward adapting and deploying AI widely in low-resource settings.» 

Ashley Llorens, Corporate Vice President and Managing Director, Microsoft Research Accelerator
Stephanie Nyairo of Microsoft Research (center) collaborates with members of Digital Green to help farmers address the challenges of climate resilience.
Microsoft researcher Stephanie Nyairo (center) works with local collaborators in Kenya to test how accurately speech models recognize farmers’ spoken questions.

There is no shortage of opportunities to extend AI’s benefits to people who cannot fully access them today, and the Project Gecko team plans to expand their work into healthcare, education, and retail in the future. They began with agriculture because the sector acts as a strategic multiplier, where investments can simultaneously advance climate, health, and education outcomes. The initial focus is on small farms in India and Kenya, where millions of people could benefit from technology that can help boost crop yields and bolster resilience in an increasingly volatile climate.

VeLLM: The foundation

Project Gecko is built on VeLLM (uniVersal Empowerment with LLMs), a platform developed by Microsoft Research India to support AI systems that create multilingual, multimodal content grounded in culturally relevant data. VeLLM uses community-contributed data and principled evaluation to improve LLM performance in non-English languages. For example, researchers from Microsoft used VeLLM to develop Shiksha copilot, which helps teachers draft lesson plans faster and improves educational outcomes in rural India. Project Gecko affirms one of the original goals of VeLLM—that AI created in one context would also translate to a different context, like agricultural information in Kenya.

«If we want to build AI for everyone everywhere, we need to develop new methods of human-centered AI. This involves forging new and deeper connections among disciplines such as machine learning, linguistics, and the social sciences, as well as the communities the AI is to serve. We all must work hand-in-hand to establish new methods for fine-tuning, model optimization, and evaluation so that AI can represent the richness and complexity of a wide range of culturally and linguistically diverse communities. Project Gecko is a great example of how we might begin to do this.»

Jacki O’Neill, Lab Director, Microsoft Research Africa, Nairobi 

AI-powered agriculture in emerging economies

Agriculture accounts for 35% of GDP in Africa (opens in new tab). In Kenya, it accounts for 20% of GDP (opens in new tab) and employs more than 40% of the population. Similarly in India, agriculture along with forestry and fisheries accounts for one-third of GDP (opens in new tab) and supports over 70% of rural households (opens in new tab). Most of these farms are run by smallholder farmers, families working on less than five acres of land. They are the backbone of rural communities, directly employing millions of people and providing crucial food security.

AI systems that reflect local cultural and agricultural contexts are essential to supporting farmers in their daily work. 
AI systems that reflect local cultural and agricultural contexts are essential to supporting farmers in their daily work. 

Several digital services and AI-powered tools help farm workers address challenges like weather, pests, and livestock health. But since the underlying large language models (LLMs) are mostly trained on English and other Western languages, farmers struggle to get the right answers using local language and cultural terms, leading to a drop in usage.

“Agriculture has very specific terms, which may change from language to language, and sometimes from district to district. There might be two different words being used for the same thing as location changes. So, all those domain-specific nuances need to be understood,” said Tanuja Ganu, Director of Research Engineering at Microsoft India, who leads the Center for Societal Impact through Cloud and Artificial Intelligence.

In Kenya, a farmer tends to her livestock as AI models adapted for local languages make agricultural guidance more accessible.
In Kenya, a farmer tends to her livestock as AI models adapted for local languages make agricultural guidance more accessible.

The local language landscape can be rather complicated. In Kenya, for example, a farmer might write in English, speak in local languages like Kikuyu or Kalenjin, and use spoken Swahili as a common language across communities. Both Kenya and India have strong oral culture, so voice communication and video answers can help with information sharing, understanding, and recall. Visual representation provides a quick way to convey information without relying on text, while limited internet connectivity means that any system must run on low bandwidth and minimal computing power to deliver timely guidance to smallholder farmers.

FarmerChat (opens in new tab) is a speech-first AI-powered assistant provided by Digital Green (opens in new tab), an organization that began as a project within Microsoft Research India (opens in new tab). It helps agricultural extension workers advise millions of farmers with trusted agricultural recommendations. For nearly two decades, Digital Green has curated a library of more than 10,000 videos in over 40 languages and dialects, including Kiswahili, Hindi, and Kikuyu. This is significant because, in many developing regions, the knowledge from people working in the field is often shared through audio and video conversations rather than written documents. As a result, multimodal approaches are essential to unlock this vast reservoir of knowledge.

Digial Green’s video library is continuously refreshed with input from farmers, extension workers, and researchers. But the full value of their impressive video collection was unrealized amid technical and linguistic challenges. The app needed to evolve from a Q&A engine into a trusted farming companion.

“Unlocking this knowledge will support even more farmers to get real-time responses to their queries in their own local language and preferred modality, whenever and wherever they need it. This will boost the effectiveness of public extension and help reach farmers with locally tailored advice.”

Rikin Gandhi (opens in new tab), CEO, Digital Green

Microsoft’s Project Gecko team envisioned farmers using speech or text to submit a query, receiving an actionable answer with step-by-step instructions in text, voice, and relevant video—each of these in the farmers’ preferred language. For example, in Nyeri County, Kenya, farmers may type a question in English or ask verbally in Kikuyu and receive the text answer in English and the voice and video answer in Kikuyu. The video would begin playing from the precise spot where a specific solution is presented.

“So, if the video is, let’s say, 30 minutes long, the user does not have to go through the entire video, but we can take the user to, let’s say, 3 minutes 50 seconds, and they can watch it from there for 2 minutes 5 seconds to get the answer. So, it’s efficient. It’s extremely time-effective for the users,” Ganu said.


Project Gecko: Building globally equitable generative AI


MMCTAgent delivers better, more relevant answers

The new multimodal critical thinking agent framework, MMCTAgent, is designed to improve cutting-edge experimental frontier models by supporting domain-specific tools that extend their capabilities. MMCTAgent looks at different types of information like audio, visual details, and textual information, and breaks down questions into smaller parts. It uses natural language processing (NLP), ethnographic design, and computer vision techniques to help FarmerChat better understand the videos and supporting transcripts, making them more accessible through search and Q&A. It comes up with strategies and adapts its reasoning as it goes. It also verifies its own answers using a built-in “critic,” helping ensure accuracy and relevance. The resulting multimodal answers are both culturally and linguistically relevant to the farmers because they are grounded in the video and information crafted by people in their own communities.

Field studies in Kenya and India showed improvements in response quality, usability, and user trust compared to state-of-the-art models, which are powerful and more established, but also more generic than frontier models. This suggests that community-grounded, multilingual, tool-augmented copilots could succeed in other domains as well.

«Before, when we faced issues with insects, crops drying up, or anything else, we used to ask other people, like neighbors, fertilizer dealers, or some experts. We weren’t sure if they were telling us the right thing, but we still had to follow their advice. Now that we have the FarmerChat application, we ask our questions, and what it tells us, we use, and we are seeing better results in our fields.”

Lakshmi Devi, Farmer, Bihar, India

Tailoring small language models for agriculture and local languages

Saiprasad Chirivirala of Digital Green (standing, left) and Arnab Paul Choudhury of Microsoft Research (standing, right) demonstrate FarmerChat during a field visit with farmers.
Saiprasad Chirivirala of Digital Green (standing, left) and Arnab Paul Choudhury of Microsoft Research (standing, right) demonstrate FarmerChat during a field visit with farmers.

Human-computer interaction research conducted by Microsoft Research Africa, Nairobi and Microsoft Research India, along with field observations, showed that farmers prefer spoken interactions in their native languages. This requires speech models that translate between spoken and written words, including automatic speech recognition (ASR) and text-to-speech (TTS) models. However, current state-of-the-art versions include almost no support for low-resource local languages in either text or speech because training data includes little or no data in these languages. In addition, digital data and computational resources needed to train effective machine learning models in these languages are scarce. 

To address this, the Project Gecko team began building new models from scratch to support ASR and TTS as well as machine translation. This process included training, NLP benchmarking, human-centered evaluation, and deployment of the models, which were then directed to ingest the library of videos along with Q&A grounded in local language content with detailed reasoning.

While low-cost connected devices are available in much of the world, they often lack the computing capacity to run modern tools and services powered by LLMs. To address this, Project Gecko researchers work with small language models (SLMs), which usually contain only a few billion parameters, compared to the 100 billion or more found in LLMs.  While greater complexity tends to yield more capability, it also demands significantly more computing resources and energy. SLMs are easier to fine-tune for targeted domains and languages and may even perform better by filling the gaps in what LLMs can do.

Five people in a field looking at a plant
Project Gecko researchers meet with farmers to test and refine FarmerChat, ensuring the tool reflects real farming practices.

The results are a set of tailored speech models and SLMs that can be continuously improved with user data and locally adapted to support a range of languages like Kiswahili, Hindi, and Kikuyu in cultural contexts in India and Kenya. The researchers continually refine the fine-tuned speech models for Kikuyu and Swahili, incorporating a dataset of 3,000 hours of crowd-sourced data from Kenyan partners. This expands the support to six languages: Swahili, Kikuyu, Kalenjin, Dholuo, Maa, and Somali. They are also working on a public leaderboard that benchmarks model performance across African languages.

The Project Gecko team continues to offer enhancements for FarmerChat based on studies with more than 130 farmers in Kenya and India. This includes the ability to ask clarifying questions, provide more actionable responses, nudge users with follow-ups, and incorporate sociality through features that foster peer-to-peer sharing and community interactions.


Project Gecko: Connecting with small-scale farmers to build better AI tools for people everywhere


Looking ahead: Expanding impact into additional domains

Project Gecko underscores Microsoft’s commitment to equitable AI (opens in new tab) and the creation of tailorable AI systems that work for a wide range of communities, businesses, and individuals. But achieving population-scale impact will require a fundamental rethinking of how AI is localized, evaluated, and deployed in a world where the foundations of AI remain highly concentrated. The U.S. and China together host 86% of global datacenter capacity, for example, and nearly 4 billion people lack access to electricity, connectivity, and computing needed to use AI.

Woman walking in a sloped field
In Kenya, a farmer examines her crops, as locally trained AI tools help improve farming decisions.

By analyzing what works in an agricultural context, Microsoft aims to identify generalizable design patterns, tools, and infrastructure that can extend to other domains, including education and health. The team will soon release a multilingual playbook with end-to-end guidance for developers building domain-specific multilingual AI applications, including tips for navigating the opportunities and challenges of designing, deploying, and evaluating AI among the global majority. This cross-cultural playbook will draw on the research studies and experiences of the Microsoft Research teams in India and Kenya to guide researchers, designers, and practitioners on making informed decisions about what matters most when collaborating with diverse communities.

«Our goal is to ensure that the next generation of AI is not only powerful, but also globally inclusive, culturally relevant, and shaped by the communities it aims to serve.»

Tanuja Ganu, Director of Research Engineering, Microsoft Research India
Outline illustration of Akshay Nambi | Ideas podcast Quote: I'm deeply interested in advancing AI Systems that can truly assist anyone.

Building AI for population-scale systems with Akshay Nambi

Advances in AI are driving meaningful real-world impact. Principal Researcher Akshay Nambi shares how his passion for tackling real-world challenges across various domains fuels his work in building reliable and robust AI systems.

Evaluating and validating research that aspires to societal impact in real world scenarios with Tanuja Ganu

Language technologies for everyone with Kalika Bali

Tanuja Ganu

Jacki O’Neill, Lab Director of Microsoft Research Africa, Nairobi, gives a keynote address on building globally equitable AI during the Microsoft Research Forum.

Microsoft Research Forum | Episode 3 | Jacki O'Neill

Researchers discuss the challenges and opportunities of making AI more inclusive and impactful for everyone during a Microsoft Research Forum panel discussion.

Microsoft Research Forum | Episode 3 | panel discussion

MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning

Story contributors: Najeeb G. Abdulhamid, Kalika Bali, Arnab Paul Chaudhury, Kavyansh Chourasia, Saiprasad Chirivirala, David Celis Garcia, Ogbemi Ekwejunor-Etchie, Kate Forster, Rikin Gandhi, Tanuja Ganu, Alyssa Hughes, Vyshak Jain, Lindsay Kalter, Prashant Kodali, Amanda Melfi, Muchai Mercy, Stephanie Nyairo, Jacki O’Neill, Sunayana Sitaram, Chris Stetkiewicz, Amber Tingle, Shauna Whooley