Skip to main content
Dynamics 365
Two people walking and talking, one of them is holding a laptop.
  • 7 min read

Dynamics 365 sets the bar for agentic sales qualification on new benchmark


In October 2025, we announced the general availability of the Sales Qualification Agent (SQA) in Dynamics 365 Sales—a breakthrough in autonomous lead qualification. Sales Qualification Agent empowers sellers by helping build higher quality opportunity while eliminating tedious, repetitive work. Sales Qualification Agent autonomously researches every lead, initiates personalized outreach, and engages prospects to understand purchase intent, ensuring that sellers spend their time meeting prospects who are ready to take the next step. With modes enabling both seller-driven and fully autonomous qualification, the agent supports a key goal for sales organizations—increasing revenue per seller.

Customers are using Sales Qualification Agent in two ways: 

  1. Helping boost revenue beyond current sales capacity
    • Responding to inbound leads within minutes instead of days, increasing response rates and in turn, qualified opportunities.
    • Engaging leads that sellers are unable to follow up on due to capacity constraints, or those deemed economically unviable to pursue.
    • Increasing pipeline quality by focusing the seller’s time on a handful of high intent, engaged leads recommended by the agent.
  2. Helping reduce sales costs
    • Reducing back-office costs related to lead research and validation, using Sales Qualification Agent in “Research only” mode to hand-off only the leads that meet the ideal customer profile criteria.
    • Automatically disqualifying low-quality leads, saving hours of seller time during the week.

Continuing benchmarking the quality of sales AI agents

Microsoft is building the future of agentic Sales technology with prebuilt AI agents, such as Sales Qualification Agent, the Sales Research Agent, and the Sales Close Agent available in Dynamics 365.

At Microsoft, we’re committed to delivering quality, trust, and transparency with our agents, and that requires rigorous evaluation. As we continue to build new agents and improve existing ones for critical sales workflows, evaluation benchmarks provide a structured and transparent way for our customers to measure quality for the jobs the agent does.

Today, we’re announcing the Microsoft Sales Bench—a new collection of evaluation benchmarks designed to assess the performance of AI-powered sales agents across real-world scenarios. This framework brings together purpose-built metrics, hundreds of sales-specific scenarios, and composite scoring validated by both human and AI judges.

The Sales Bench isn’t starting from scratch. It now formalizes and expands what began with the Sales Research Bench, published on October 21, 2025, which evaluates how AI solutions answer business research questions for sales leaders.

Today, we’re extending the Microsoft Sales Bench with a second benchmark: the Microsoft Sales Qualification Bench, focused on measuring how effectively AI agents qualify leads and generate high-quality pipeline.

Introducing the Sales Qualification Bench for lead qualification

This Microsoft Sales Qualification Bench evolved from rigorous evaluations we conducted since the Sales Qualification Agent’s public preview in April, with the goal of objectively measuring quality as we further developed the agent, partnering with customers from a diverse set of industries. Since the preview, we measured every update against these standards, ensuring improvements are real and repeatable.

We generated a synthetic dataset modeled after companies from three different industries, with 300 leads, with attributes such as name, company, and email ID—representative of what sales teams typically work with before any enrichment or hygiene is performed. In addition to these typical attributes, we also added key knowledge inputs such as value proposition of the products being sold, customer case studies, and documentation for answering customer questions.

In addition to Sales Qualification Agent, we used the evaluation framework to measure ChatGPT by OpenAI on the same dataset. Since we didn’t have access to an autonomous agent from OpenAI, we mimicked how a human seller would use ChatGPT to recreate the three key jobs SQA performs. We provided each system—Sales Qualification Agent and ChatGPT—the exact same lead inputs, knowledge sources, and contextual signals under controlled evaluation configurations. We used a ChatGPT Pro license with GPT-4.1. This model is the closest match (and slightly better) to Sales Qualification Agent’s GPT-4.1 mini, which we intentionally chose to deliver optimal quality at lower cost per lead than newer models. Additionally, Pro license was chosen to optimize for quality: ChatGPT’s pricing page describes Pro as “full access to the best of ChatGPT.”1

The framework evaluates outputs from the three jobs across Sales Qualification Agent and ChatGPT:

  • Research: Company research for the lead—background, strategic priorities, financial health, and latest news.
  • Outreach: A personalized email generated based on research, to make initial contact with the lead.
  • Engagement: The agent’s conversation with a lead until it’s qualified or dispositioned.

Our scoring metrics span core quality (accuracy, relevance, completeness), trustworthiness (grounding and citations), and business-specific success criteria (e.g., relevancy of company research to highlight interest in the seller’s offerings, personalization of the initial outreach emails sent to catch the lead’s attention, accuracy of responses to the lead’s questions to drive purchase intent, and the timing of handoff to a seller when the lead is ready to engage).

Outputs were scored independently by both human reviewers and an LLM judge built with GPT-5.1, using a 1–10 scale for each metric. These metric-specific scores were then rolled up using a simple average to produce a composite quality score. The result is a rigorous benchmark presenting a composite score and dimension-specific scores to reveal where agents excel or need improvement. Our methodology, metrics, and their definitions are described in this technical blog.

Results

In evaluations completed on December 4, 2025, using the Sales Qualification Bench, Sales Qualification Agent outperformed ChatGPT on each of the three jobs required for sales qualification:

  1. Research: The Sales Qualification Agent outperformed ChatGPT with 6% higher aggregate scores, leading on relevancy and completeness in research results that highlighted the lead company’s interest in the seller’s offerings.
  2. Outreach: Sales Qualification Agent demonstrated 20% better results compared to ChatGPT, generating email drafts with accurate personalization and mentions of relevant recent events that will resonate with the lead.
  3. Engagement: Sales Qualification Agent’s email responses to engage a lead over a multi-turn conversation scored 16% higher than ChatGPT’s. SQA generated emails that responded to the lead’s questions with accurate answers that develop their purchase interest and with precise discovery questions that qualify the lead before handing off to a seller.

In addition to performing better on these metrics, Sales Qualification Agent has the ability to run autonomously, which can help significantly reduce the time spent generating pipeline while helping sales teams build better quality pipeline.

Sales Qualification Agent scores well on these three jobs as its optimized for sales-specific scenarios and uses the following techniques to get great results:

  1. It uses agentic Retrieval Augmented Generation (RAG) to relentlessly research each lead, ensuring greater completeness. More on this in the following section.
  2. With knowledge of what the company sells, it can contextualize every workflow to increase relevancy for both the seller and the lead.
  3. It can retrieve organizational knowledge from attached documents and internal repositories like SharePoint with greater precision, boosting accuracy of its responses when engaging with the lead.

The technical blog details which metrics SQA excels at relative to ChatGPT, where it falls short, and why.

Translating evals to real-world impact

Running evals led to major Sales Qualification Agent improvements during its six-month preview. Early results prompted us to try agentic AI design patterns, especially agentic RAG, which improved our company research by allowing iterative web searches and real-time reasoning. They also led us to enhance data coverage by auto-linking existing CRM records to each lead and inferring company names from lead emails. These updates provided sellers with deeper insights, revealing strategic opportunities and risks beyond basic facts.

For instance, when researching leads for a security company, Sales Qualification Agent can link news on recent cyberattacks to increased demand for its software. As highlighted in the technical blog, research synthesized by the agent makes such inferences more consistently than ChatGPT. Enhancing the agent’s research also improved the relevance and personalization of outreach emails, helping agents better engage leads and clarify their ability and intent to purchase before handing them off to sellers.

Sandvik Coromant, a leader in precision cutting tools, partnered with us to pilot Sales Qualification Agent for their Digital Commerce program. After the updates, Pia Cedendahl, Global Sales Manager for Strategic Channels/Partners and Online Sales, noted, “Sales Qualification Agent’s answers became far more on-point to our business—it’s like having a research assistant that already understands what we care about.” Sandvik Coromant saw improved lead conversion and higher engagement from their Digital Account Managers, validating the impact of our evaluation-driven approach. Pia joined Microsoft leaders at the Microsoft Ignite 2025 session, “Accelerate revenue and seller productivity with agentic CRM,” where she shared how the team saved more than 120 hours and $19,000 in just the first three weeks since launching a pilot, and forecasted a 5% increase in revenue with full rollout.

Better insights, more personalization, proven value

Equipped with agentic AI design and backed by data-driven evaluation, customers can confidently use the Sales Qualification Agents so that:

  • Sellers receive comprehensive company overviews, timely news highlights, and actionable recommendations that are consistently delivered with high quality—drawing a clear line from insight to action.
  • Sales leaders can expand their qualified pipeline cost efficiently, with the agent ensuring high lead quality.
  • Prospects benefit from more personalized outreach, enhancing their experience and supporting increased conversion rates.

What’s next

We’ll continue to refine Sales Qualification Agent using agentic design patterns, aiming to make every improvement measurable and meaningful. Stay tuned for the full evaluation results and methodology for the Sales Qualification Bench, which will be published for transparency and reproducibility. We also intend to add more evaluation frameworks and benchmarks to the Microsoft Sales Bench collection including benchmarks that cover future sales agent capabilities.


1ChatGPT pricing page, accessed November 24, 2025

Get started with Dynamics 365

Drive more efficiency, reduce costs, and create a hyperconnected business that links people, data, and processes across your organization—enabling every team to quickly adapt and innovate.