Capacity pioneers next-gen enterprise search using Phi models from Microsoft

When Capacity sought to enhance its Answer Engine to better retrieve accurate, context-aware, and multilingual answers from organizational knowledge, it knew a shift to a generative AI–native architecture was crucial.

The company chose Phi small language models from Microsoft, via Azure AI Foundry, as the clear winner based on quantitative and qualitative output, pricing, deployment capabilities, and third-party benchmarks.

The solution is more scalable, secure, and cost-effective and has already delivered 4.2x the cost savings, 97% tagging accuracy, and faster document summarization, yielding significant performance and customer satisfaction improvements.

In enterprise environments, information is everywhere—and nowhere. Knowledge lives across documents, emails, chat threads, PDFs, videos, and apps. But when employees need answers, traditional search falls short. Teams waste time digging, re-asking, or giving up altogether because context is missing, formats are fragmented, and silos block discovery.

Capacity, a support automation platform serving enterprise organizations, set out to fix that. Its Answer Engine® delivers precise, context-aware responses to natural language questions by retrieving information from across an enterprise’s unstructured knowledge. The goal: eliminate friction, empower users with instant answers, and unlock the full value of a company’s internal knowledge—no matter where it lives.

Transitioning to Microsoft Azure AI Foundry

To take its Answer Engine to the next level, Capacity re-architected its pipeline with generative AI at the core. The team needed a solution that could understand user intent, work across formats, and deliver fast, reliable answers at scale—without compromising on cost, security, or flexibility.

Capacity selected Microsoft Azure Phi small language models via Azure AI Foundry to power both real-time query handling and offline content enrichment. Phi delivered the low-latency performance and cost efficiency required to support dynamic user interactions. Its deployment flexibility, across Azure environments or air-gapped Kubernetes clusters, met enterprise-grade security demands.

Capacity also incorporated Azure tools for multilingual search, document enrichment, and semantic retrieval, allowing the system to handle everything from keyword tagging to video indexing. The result is a modern, scalable architecture that bridges the gap between unstructured content and meaningful answers.

“To deliver a superior experience, we needed to move from a natural language classifier to a generative AI solution,” says Steve Frederickson, Head of Product, Answer Engine at Capacity. “Knowledge management complexity requires a robust, scalable, flexible, and cost-effective approach.”

Improving retrieval relevance and accuracy

Capacity implemented Phi-4 and Phi-4-mini from Azure AI Foundry Models for its speed, cost-effectiveness, and deployment flexibility. Using both 4K and 128K variants with prompt engineering, adherence workflows, and structured indexing helped refine search accuracy and accelerate engine development.

The metadata tagging ability of Phi-4-mini also helped Capacity optimize search results, ultimately improving development speed and query processing efficiency. Phi models are able to run locally, which provided Capacity with a strong long-term strategy for its private cloud deployment, vectorization, and query routing activities. All proved to be advantages in helping customers search for hyper-specific responses and receive more precise, context-aware results.

“We work with a variety of companies in different industries, including consumer packaged goods. For example, a global condiment manufacturer used Answer Engine to find a research study on optimizing the viscosity of their product without adding sugar,” says Zach Meierhofer, Director of Customer Success at Capacity. “Another use case involved employee onboarding, where new team members ask the engine questions and get immediate answers, helping them learn about the organization faster to become productive more quickly.”

“Our customers trust us to protect their sensitive content, and Azure helps us ensure their deployment data is private, individualized, safe, and secure.”

Zach Meierhofer, Director of Customer Success, Capacity

Defining optimal components for an enhanced Answer Engine

To deliver precise, multilingual, and secure answers at scale, Capacity redesigned its Answer Engine using a modular architecture built on Azure AI. Each component was selected for a specific role in the pipeline and optimized for speed, performance, and user experience.

At the heart of the solution is Azure AI Foundry, where Capacity deploys the Phi small language models. These lightweight models run without fine-tuning and support both offline enrichment—like title generation and keyword tagging—and real-time query refinement. To help ensure scalable, containerized deployment across secure environments, Capacity uses Azure Kubernetes Service (AKS) to manage their Kubernetes clusters and support a range of compute needs—including backend services, file processing, and reporting—across both its cloud environment as well as in customer-hosted, air-gapped infrastructures. Capacity adopted Azure AI Foundry early, enabling it to validate model performance, security, and latency benchmarks ahead of broader implementation. The result is a low-latency, high-performance system that delivers the benefits of generative AI at a fraction of the cost of larger models. With flexible deployment options—including air-gapped Kubernetes clusters and customer-specific Azure environments—Capacity ensures security, scalability, and enterprise-grade reliability across industries.

“Features that were previously impossible can now be rolled out quickly, allowing us to add new functionalities to our app rapidly.”

Steve Frederickson, Head of Product, Answer Engine, Capacity

To support multilingual search, the team layered in Azure AI Translator and Azure AI Language. Users can now ask questions and receive results in their preferred language, with support for up to 46 languages. Translation happens on the fly, and built-in language detection helps deliver a seamless experience across global teams.

For fast, relevant results, Capacity uses Azure AI Search to index the enriched content produced during preprocessing. Metadata and semantic structure are fully employed to return accurate, context-aware answers. Behind the scenes, Azure Database for PostgreSQL supports structured metadata management and content configuration, helping ensure the retrieval layer remains fast, consistent, and optimized for scale.

Finally, Azure AI Video Indexer expands the reach of Answer Engine to multimedia. By transcribing and tagging video and audio content—like webinars, podcasts, and training sessions—Capacity makes knowledge stored in rich media as searchable and discoverable as any document or deck.

Together, this Azure-powered architecture forms a modern retrieval system that’s accurate, multilingual, secure, and media-aware—designed to meet the complexity of enterprise content at scale.

“Security and privacy concerns come up every time,” says Meierhofer. “Our customers trust us to protect their sensitive content, and Azure helps us ensure their deployment data is private, individualized, safe, and secure.”

Frederickson agrees, “For customers needing higher security, we deploy models in their Azure environment to make sure data doesn’t leave their cloud.” He adds that Capacity can offer even more secure options through the deployment capabilities of Phi.

Scaling to grow datasets while avoiding prohibitive costs

“Unifying our datasets with the Phi-4-mini model was effortless,” says Frederickson. “We have found new opportunities in its speed, and the enriched customer experience of GenAI enables us to resolve customer issues far more effectively.”

Additionally, the model as a service (MaaS) offering from Azure AI Foundry provides developers the ability to interoperate with cutting-edge models across the landscape, addressing a wide range of customer use cases with enhanced efficacy. This, in turn, elevates the overall user experience. “Depending on the use case and timing, we deploy different models,” says Frederickson. “For example, we use small language models for title enrichment asynchronously, while more sophisticated models are used for high-touch tasks like document chat and focused research. This approach allows us to balance cost and effectiveness for our customers.”

Capacity appreciates that the models have rapidly increased in sophistication. Initially, its generative title capability was limited to a single source at a time, but additional infrastructure and model scalability can now be built into the pipeline for all approved use cases. The result is improved content and a better user experience with faster, more accurate results.

Maintaining fast response times without compromising accuracy: how Answer Engine works

For the user to feel confident in their search results, it’s important for Answer Engine to instantaneously present the full answer to a query in addition to the related answer content metadata. To accomplish this, Capacity engineered split tasks for Phi into preprocessing and real-time flows.

In preprocessing, Capacity generates metadata such as title summaries for answers, keyword tags for search, and other information to the index. This prework is done offline and ahead of time. Depending on the tagging task required for each answer and the customer’s generative AI preferences, Capacity routes the query to the appropriate Phi model.

At query time, Phi models preprocess the query to retrieve the most relevant content. The split tasks for Phi enabled repeatable performance, keeping the responsive query times that users expect while enhancing results with new functionality and increased retrieval relevance.

The following Capacity AI Answer Engine diagram illustrates the products and solutions involved in creating it, including Azure Database for PostgreSQL, AKS, Azure Phi models from Azure AI Foundry Models, and AI Translator, AI Language, AI Search, and Video Indexer.

More about this diagram

Celebrating solution outcomes

After incorporating Phi, the team observed significant improvements in both performance and customer satisfaction. “The speed at which we can deploy these models is impressive,” says Frederickson. “Features that were previously impossible can now be rolled out quickly, allowing us to add new functionalities to our app rapidly. This acceleration is exciting and continues to evolve.”

Capacity’s tracked outcome metrics include 4.2 times more cost savings compared to a competitive tagging pipeline, a 97% first-shot tagging success rate before retrying or alternate prompting, and a 56% improvement in tagging accuracy compared to the previous generation pipeline. It also reduced data summarization time from 12-14 seconds to just 4-5 seconds.

“What truly impressed us about building this solution, in using the Azure AI Foundry stack, was its remarkable accuracy and the ease of deployment, even before customization.”

Steve Frederickson, Head of Product, Answer Engine, Capacity

Paving the way for future customer successes

Using AI-driven search and retrieval, Answer Engine can be set up in minutes, delivering users instant, high-quality, and accurate answers in real time. The solution also improves trust and transparency with metadata tagging, providing context around responses, and enhancing confidence in search results within a highly secure environment.

“What truly impressed us about building this solution, in using the Azure AI Foundry stack, was its remarkable accuracy and the ease of deployment, even before customization,” says Frederickson. “Since then, we’ve been able to enhance both accuracy and reliability, all while maintaining the cost-effectiveness and scalability we valued from the start.”

Looking ahead, Capacity plans to explore additional state-of-the-art models, such as Phi-4-multimodal for more complex reasoning tasks like query feature management and image understanding scenarios. It also plans to level up its solutions with newer Phi models to enhance its knowledge graph and improve interoperability among different institutional knowledge bases.

Discover more about Capacity on Facebook, Instagram, LinkedIn, and YouTube.