Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

Gagan Bansal; Wenyue Hua; Zezhou Huang; Adam Fourney; Amanda Swearngin; Will Epperson; Tyler Payne; Jake Hofman; Brendan Lucier; Chinmay Singh; Markus Mobius; Akshay Nambi; Archana Yadav; Kevin Gao; David Rothschild; Aleksandrs Slivkins; Daniel G. Goldstein; Hussein Mozannar; Nicole Immorlica; Maya Murad; Matthew Vogel; Subbarao Kambhampati; Eric Horvitz; Saleema Amershi

Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

MSR-TR-2025-50 | October 2025

Published by Microsoft

Download BibTex

As LLM agents advance, they are increasingly mediating economic decisions, ranging from prod-
uct discovery to transactions, on behalf of users. Such applications promise benefits but also raise
many questions about agent accountability and value for users. Addressing these questions requires
understanding how agents behave in realistic market conditions. However, previous research has
largely evaluated agents in constrained settings, such as single-task marketplaces (e.g., negotiation)
or structured two-agent interactions. Real-world markets are fundamentally different: they require
agents to handle diverse economic activities and coordinate within large, dynamic ecosystems where
multiple agents with opaque behaviors may engage in open-ended dialogues. To bridge this gap, we
investigate two-sided agentic marketplaces where Assistant agents represent consumers and Service
agents represent competing businesses. To study these interactions safely, we develop Magentic
Marketplace– a simulated environment where Assistants and Services can operate. This environ-
ment enables us to study key market dynamics: the utility agents achieve, behavioral biases, vul-
nerability to manipulation, and how search mechanisms shape market outcomes. Our experiments
show that frontier models can approach optimal welfare—but only under ideal search conditions.
Performance degrades sharply with scale, and all models exhibit severe first-proposal bias, creating
10-30x advantages for response speed over quality. These findings reveal how behaviors emerge
across market conditions, informing the design of fair and efficient agentic marketplaces.

Related Tools

Magentic Marketplace

November 12, 2025

Magentic Marketplace is an open-source simulation environment for exploring the numerous possibilities of agentic markets and their societal implications at scale. It provides a foundation for studying these markets and guiding them toward outcomes that benefit everyone.

Access

Introducing Magentic Marketplace, an open-source simulation environment for studying agentic markets

This video gives a brief introduction to Magentic Marketplace, an open-source platform for simulating agent-based markets. You’ll learn about its main features, how to start a simulation using the command line with built-in datasets, and how to explore agent interactions and market dynamics through its visual interface. Magentic Marketplace is open-source and available on GitHub (opens in new tab)and Azure AI Foundry Labs (opens in new tab).

Magentic Marketplace: Testing societies of agents at scale

As AI agents move from isolated tools to active participants in multi-agent ecosystems, their success depends on more than task competence—it requires strategic behavior under misaligned incentives and imperfect information. Using Magentic Marketplace, an open-source simulation of two-sided agent markets, we show that while frontier models can achieve strong welfare outcomes in ideal settings, performance degrades at scale and reveals emergent failure modes such as manipulation and speed bias, motivating a shift toward training agents for social reasoning.

Explore more

All Research Forum sessions

Transcript

Magentic Marketplace: Testing societies of agents at scale

As AI agents become more capable, they are no longer working alone. Agents start interacting with each other in shared environments. And when that happens, things can get interesting here in Redmond. Gagan Bansal from MSR, AI frontiers studies what happens when agents start transacting in a market-like settings using an open source environment called Magentic Marketplace.

His team explores how agents behave when incentives don’t personally line up, and what kind of failure modes start to appear as systems scale.

This is exactly the kind of early curiosity driven research that helps us understand what could go wrong before these systems light up. Take it away.

Hi, I’m Gagan Bansal and I’m a researcher at Microsoft. And today I want to talk to you about our recent work on applying societies of agents in markets. And although I’m presenting, this work was a collaboration between many amazing colleagues across Microsoft.

Capabilities of AI agents are improving rapidly. We’re quickly moving towards a future where each one of us will have personal agents. Now, in a world where everyone has agents, we believe societies of agents will drive new applications where our agents will have to interact with other agents. But how can we trust agents that we don’t control, or agents that might know things ours doesn’t, or even have competing goals?

At Microsoft, we’ve been building on our expertise in multi-agent frameworks like AutoGen and Magentic one to create useful societies of agents, ones that add value, save time, and don’t cause harm. To enable this future, we need to understand how agents behave when they interact at scale. Recent examples from the open source community, where agents could talk freely on the forums, only underscores how timely and important this question is.

Let me show you what we built and what we found. Imagine a marketplace where all the buying and selling is done by agents representing people. We call these settings as two sided agentic markets. This setting is a great testbed for society’s agents, because every agent has access to different information and competing incentives to systematically study two sided markets.

We at Microsoft Research built a new simulation environment called Magentic Marketplace. Here, assistant agents represent customers, service agents represents businesses, and a marketplace sits in the middle handles search, communication and transaction between agents. Here’s a typical interaction.

Suppose a customer wants to find a restaurant with something specific, like delicious empanadas and outdoor seating. Their assistant can search the marketplace. Talk to the service agents. Ask about menus, check amenities, and finally make a reservation. This framework allows us to test hundreds of agents buying and selling in parallel.

We used it to systematically ask many research questions. Do these agents even add value for customers and businesses? Does the quality of search result impact their behavior? Are they vulnerable to any biases or manipulation? We built this framework as a general research tool. It can be used to ask many other questions, even for domains beyond markets.

We started by asking whether agents actually add value for consumers. To find out, we implemented agents using frontier and open source models and computed the welfare that they achieve. Here, welfare is the value customers get from their purchase minus the price they paid. Higher is better. We observed that when agents have access to high quality search results, frontier models like GPT 5 and Sonnet 4 reached near optimal welfare.

They talk to business agents, gathered missing information and made good choices. But we found that agent performance was tied to the quality of search results. When the search quality dropped, performance dropped. We also observed that there is still a large gap between the welfare achieved by frontier models and open source models.

In addition to the impact of the quality of search results, we also want to test whether the number of search results impact welfare. So we gave agents more search results, varying them from 3 to 100, and expected the welfare to increase. But the opposite happened, resulting in a surprising paradox of choice.

Welfare dropped for almost every model. This happened because agents didn’t explore enough and contacted few businesses. We also conducted experiments that tested whether the order of offers from service agents matters. It did dramatically. Almost 80 to 100% of the agents accepted the first proposal they received.

They never even looked at the alternatives. Think about what this means for a real market. Speed beats quality. A business gains more from responding fast than from offering a better deal. That’s not a healthy dynamic. We also tested vulnerability to fake reviews, fake awards and prompt injection.

Some frontier models resisted everything, but others were completely compromised. All payments were redirected to the attackers. These are early findings and markets are just the beginning. Societies of agents will emerge anywhere. Agents represent people with different interests such as supply chains, hiring and negotiation.

Magnetic marketplace is open source and GitHub for the community to run experiments, stress test agents and help answer the harder questions. What guardrails do we need? How should markets be designed when both sides are AI? What we’ve shown is that simulation matters. Agents can add value, but they also inherit biases, fall for manipulation, and make choices that reward speed over quality.

These are not edge cases. These are behaviors that only emerge when societies of agents are tested at scale. If these agents are going to take high stakes decisions on our behalf, such as making transactions to other agents, we should understand their behaviors and biases before deployment and not after.

Please check out our papers and GitHub repository for more information. And thank you for attending the Microsoft Research Forum.