Like many large Enterprise organizations, Microsoft’s support team develops thousands of troubleshooting guides, self-help resources, knowledge base articles, and process documentation for our support engineers to leverage AI and ChatGPT when helping customers with technical issues.
As we have grown over the years here at Microsoft and with the continuous release of new products and features, these document repositories have become incredibly large, sometimes unmanageable, and occasionally contain outdated information.
It was fascinating how fast things were changing. OpenAI was so new and features growing so fast from GPT 2.0 to 3.0 to ChatGPT to 4.0 almost overnight. Keeping up with the technology is a challenge and big opportunity.
—DJ Ball, senior escalation engineer, Modern Work Supportability
Back in July 2022, our Modern Work Supportability team had a concept idea for semantic search that would potentially allow our support engineers to go to a single location, and enter a search query that would scan the vast support document repositories and return results based on the subject of inquiry.
Over the course of the year, team members explored different solutions and had many conversations with others that eventually led them to the world of generative AI. The team knew they were on the cutting edge of something exciting with lots of possibilities. Then, a few months later, ChatGPT was announced and flipped the world on its head.
People thought we planned it, which we didn’t, but for once we felt we were ahead of the game.
—Sam Larson, senior supportability PM, Modern Work Supportability
“It was fascinating how fast things were changing,” says DJ Ball, a senior escalation engineer on the Modern Work Supportability team. “OpenAI was so new and features growing so fast from GPT 2.0 to 3.0 to ChatGPT to 4.0 almost overnight. Keeping up with the technology is a challenge and big opportunity.”
The team quickly shifted gears, secured subscriptions to Microsoft Azure OpenAI, and stumbled upon an internal GPT playground focused on handling enterprise content. This playground happened to be very similar to what the Supportability Team was designing, which made joining forces much easier.
“People thought we planned it, which we didn’t, but for once we felt we were ahead of the game,” says Sam Larson, a senior supportability PM on the Modern Work Supportability team.
Then the development of an AI-based solution really picked up speed. No one had done what they were attempting to do with this ChatGPT technology so they had to learn by digging in, playing around, and seeing what would happen. The Modern Work Supportability team provided continual feedback to the engineering team about what worked and what didn’t and helped to shape the product that was recently announced as Microsoft Azure AI Studio, which makes integrating external data sources into Microsoft Azure OpenAI Service simple.
With this development, the team was allowed to create their own private chat workspace, which they called Modern Work GPT (MWGPT). The Modern Work Supportability team started by curating content from different sources for the Teams product and injecting that into the large language model (LLM).
By leveraging Azure Cognitive Search to help inject and chunk the documentation into smaller components, they were able to test the results with the help of subject matter experts (SMEs) across the Teams support business. They’ve expanded to include all Modern Work Technology support documentation estimated to be more than 300,000 pieces of content for 34 products. They’ve learned a lot along the way about content curation, prompts, use case scenarios, and how LLMs work.
The team quickly realized that they needed more people to help them test and are now working with over 450 SMEs across the Modern Work Support business to continue refining the content and testing the solution for accuracy.
There is no question that there are a lot of variables that come into play that we are exposing our engineers to, and quality is a non-negotiable factor. We owe it to our customers who turn to our support engineers to help them solve their most challenging technical problems.
—Mayte Cubino, Modern Work support director, Office and Project Products
One of the early volunteers was Mayte Cubino, a Modern Work Support director for Office and Project Products. An engineer at heart, Cubino was excited and curious about rumblings she was hearing across the business about possibilities to leverage ChatGPT in supporting customers. After a conversation with Ross Smith, the leader of the Supportability team, she knew that she could add value to the project from a support delivery perspective.
From a delivery standpoint, two questions stood out:
- How could we ensure a successful deployment of this new technology across all our support engineers?
- How could this technology be the most helpful without creating extra work for anyone?
Cubino started to document the content process and helped the team see that some things were nonnegotiable, and outlined steps they needed to focus on to ensure accuracy and responsible engagement with the model.
“There is no question that there are a lot of variables that come into play that we are exposing our engineers to, and quality is a non-negotiable factor,” she says. “We owe it to our customers who turn to our support engineers to help them solve their most challenging technical problems.”
[Discover how AI will change the employee experience at Microsoft—and at your organization. Unpack fueling Microsoft’s knowledge sharing culture with Microsoft Viva Topics. Explore modernizing Microsoft’s internal Help Desk experience with ServiceNow.]
Unpacking our 6 Ds framework
This documentation process led to the 6 Ds Framework designed to provide a roadmap for deploying Enterprise content on private LLMs.
Number 1: Discover
Discover is the initial phase during which the team identifies and defines the goals and objectives or problem that needs to be solved. This phase is where lots of learning and research, exploration, and analysis take place. Important steps to consider and perform during this phase include the following:
Content curation
Getting your content ecosystem ready is key:
- Assessing the user needs and data sources, understanding how the AI model will connect with the wider service and explore the location and condition of the data you will use.
- Assessing the existing data, its accuracy and readiness, in order to see if the data has a high enough quality for an AI system to make predictions from.
- Preparing your data to make sure it is secure and unbiased. Your data should also be diverse and reflective of the population you are trying to model. This will help reduce conscious and unconscious biases.
Number 2: Design
In the design phase, the list of requirements developed in the discovery phase are used to make design choices. Ideation, testing, and prototyping are activities that govern this phase.
When preparing for AI implementation, you should identify how you can best integrate AI with your existing technology and services. It’s useful to consider how you’ll manage content creation.
Content creation
Considering how you collect and store your data is key:
- Data collection pipelines to support reliable model performance and a clean input for modelling, such as batch upload or continuous upload.
- Storing your data in databases and how the type of database you choose changes depending on the complexity of the project and the different data sources required.
- Data mining and data analysis of the results.
Responsible AI review
Plan for security before you start:
- Make sure you design your system to keep data secure, designing for Responsible AI and compliance with General Data Protection Regulation (GDPR) and other policies and standards. For example, Microsoft’s responsible AI principles include fairness, reliability and safety, privacy and security, inclusiveness, and transparency and accountability.
Other design elements
Be sure to think about using technology efficiently and plan how to train your AI model:
- Any platforms your team uses to collate the technology used across the AI project to help speed up AI deployment.
- The network and memory resources your team needs to train your model are important to think about in addition to ongoing cost. Writing and training algorithms can take a lot of time and computational power.
Number 3: Develop
During the development phase, you need to create and test grounding data sets.
The following critical steps in this phase make it a highly iterative process requiring substantial amounts of data:
- Content preparation: Content preparation and data quality assessment using a combination of accuracy, bias, completeness, uniqueness, timeliness, validity, and consistency.
- Content ingestion: Ingestion of curated data content in the required formatting for the model. Larger documents should be chunked into smaller sections before ingestion.
- Prompt engineering: Fine-tuning the prompt (known as prompt engineering) to elicit a desired response from the model or to prevent it from generating certain types of output. The prompt can be appended with grounding data from the curated content. Chunking the larger documents allows smaller subsections of the document to be used for the prompt. This is an iterative process and may require several rounds of testing to achieve the desired results.
Number 4: Diagnose
Rigorous and uncompromised testing and training are crucial before you proceed towards the stage of deployment, but this can be a time-intensive process.
Before you deploy and use your model, you need to understand whether it’s actually delivering the kind of results that you were looking for. You must check that these results are accurate and that the data you’re loading into the model will keep these models consistent and relevant over time. Weak, old data can create model drift, leading to inaccurate outcomes. In this phase, consider these elements:
- Responsible development and diagnosis: This is an important stage of building your responsible AI systems—from data collection and handling, to ensuring fairness in performance and representation, transparency through validating citations, security and privacy, accountability by including author contact info, and emphasizing inclusiveness by incorporating a feedback process into pre-deployment validation.
- Validate the chatbot deployment: After the chatbot has been trained, it should be tested to ensure that it provides accurate responses and not hallucinations. The testing should be conducted in a controlled environment, and the chatbot’s responses should be compared to the approved documents and data.
Testing the AI model throughout the process is critical to mitigate against issues such as overfitting and underfitting that could undermine your model’s effectiveness once deployed.
- Overfitting refers to an AI model that models the training data too well. It happens when an AI model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the AI model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model.
- Underfitting refers to a model that can neither model the training data nor generalize to new data. An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data. The remedy is to move on and try alternate machine learning algorithms.
Finally, we use prompt tuning to ensure that the model’s predictions are consistent across different inputs.
Number 5: Deploy
Deployment is defined as a process through which you integrate a machine learning model into an existing production environment to obtain effective business decisions based on data. It’s one of the last steps in the machine learning lifecycle and should be preceded by a SME and validation team signoff process.
This phase involves integrating the AI model into the service’s decision-making process and using the live data for the AI model to make predictions. When launched, it’s very important to continuously evaluate the AI model to ensure it still meets the business objectives, and the performance is at the level required.
This makes sure the AI model’s performance is in line with the modelling phase and helps you identify when to retrain the AI model. It can help you feel confident in using, interpreting, and challenging any outputs or insights generated by the AI model.
Number 6: Detect
After completing the typical workflow with steps like data ingestion, pre-processing, model building and evaluation, and finally deployment, it’s time to include a key component: AI model monitoring.
The primary motivation of any model monitoring framework is to create a feedback loop post-deployment back to the model building phase. This helps the ML model to constantly improve itself by deciding to either update the model or continue as it is.
ML models are driving some of the most important decisions for businesses, and so it’s important that these models remain relevant in the context of the most recent data when deployed into production. Elements to consider in this phase include:
Monitoring
When the model has been deployed, it should be monitored regularly to ensure that it provides accurate responses and not hallucinations. The model’s responses should be reviewed periodically to ensure that they’re still aligned with the organization’s principles and values.
Implementing a feedback loop
Learning from Reinforcement Learning from Human Feedback (RLHF) is a powerful technique that allows you to train language models to follow instructions and preferences, using human feedback as the ultimate metric. RLHF can improve the quality and diversity of model outputs, as well as enable new use cases that are not easily solved by standard fine-tuning or prompt crafting methods.
We’re one of the first in the world to do this on support content and we recognize the opportunity and responsibility we have to get this right. We hope others can build on our lessons, learn and improve this for everyone.
—Ross Smith, Modern Work Supportability team leader
An enterprise can implement a feedback loop where customers can provide feedback on the model’s responses. This helps the enterprise identify any inaccuracies or errors in the model’s responses and make necessary corrections.
Other considerations
As the team developed their Modern Work GPT solution and implemented the 6 Ds framework, there were key considerations they were incorporating.
“We’re one of the first in the world to do this on support content and we recognize the opportunity and responsibility we have to get this right,” Smith says. “We hope others can build on our lessons, learn, and improve this for everyone.”
Starting to think about content support in this way led Smith to have many conversations across Microsoft to explore what other teams were doing, how to integrate best practices into the MWGPT solution and focus on building a responsible LLM from the start. The team looked to create a model that was responsible AI (RAI) ready. This required an understanding of its potential impacts—both beneficial and harmful on people and society––and took the appropriate measures to mitigate anticipated harms and prepare responses to unanticipated ones.
Another consideration was the measurement of success, not just of the model itself, but also the impact on the business based on different use case scenarios. Knowing that support engineers would be using the model to assist with administrative type tasks, such as email and auto-summarization, and technical tasks, such as troubleshooting, learning, and debugging assistance, helped the team develop a set of metrics they could track to see how the model assisted the engineers, impacted their productivity, and contributed to our customer’s experience.
The application of the 6 Ds framework can easily reach well beyond the technical support use case scenario to include information from a host of other company disciplines, such as human resources, finance, sales, and legal.
“Humans and machines are more powerful working together than either one alone,” Smith says. “Those that really embrace and explore this new technology will be ready for the new roles that will be needed to make it successful.”
Qualified and experienced prompt engineers, strong content curators, and responsible AI experts will soon be in high demand as more companies employ AI technologies in their own enterprise.
Inputting all of your content into a chat model is like taking a flashlight and shining it in every dark corner of your content. You quickly realize what’s outdated.
—Jason Weum, director of supportability, Modern Work Supportability
The Modern Work Supportability team continues to learn and adapt the model as new innovations surface. Here are what they’ve found critical to a successful deployment, so far:
Know where your documents are. Often in large organizations, there are multiple locations where different types of documents are stored, including learning repositories, SharePoint, Internal Troubleshooting Repos, Wikis, and much more.
Everything you put into the model matters. The curation and creation of the documents ingested into the model is key. Capturing any changes or updates through the Detect phase is equally important. Currently, images and PDFs are not successfully ingested (although, with the current rate of innovation, this may soon be available). Text or markdown formats tend to work best. If the data that you put into the model is wrong, conflicting across different sources or outdated, the model does not return quality answers.
“Inputting all of your content into a chat model is like taking a flashlight and shining it in every dark corner of your content. You quickly realize what’s outdated,” says Jason Weum, director of supportability on the Modern Work Supportability team.
Review and retrain. Stay up to speed on the latest information and training to keep your model from “drifting,” and ensuring accuracy of the source documents ingested.
Gather feedback from subject matter experts. One of the best ways to improve the accuracy of the model is to ask users and Subject Matter Experts for feedback on the results that are returned. That way you can work to update source content for higher quality results.
Prompt. People may not intuitively know how to prompt, or interact with the model, and preparing your model in the Develop phase is key. Providing tips and tricks and guidance for end users on how to ask questions or prompt can also be helpful.
Change management matters. Don’t take it for granted that everyone will see these LLMs as a huge opportunity. Change management activities to aid in adoption and knowledge helps drive excitement and use.
Embrace the future. This technology is moving fast, and everyone has an opportunity to learn, grow, and apply it to their business.
As we dive into the world of Modern Work GPT support indexes, refining our models and embracing the AI revolution, we are like a team of tech-savvy superheroes, ready to take customer support experiences that are out of this world to the next level. The future is here, and we’re excited to be on this wild ride of innovation.
—Shakil Ahmed, general manager, Modern Work Support
What’s next
The team is looking forward to many new opportunities:
- Using Copilot in Dynamics 365 Customer Service and applying their learnings directly to the product. Increased use of Copilot by all Microsoft support engineers will help us continue to improve the product.
- Carrying on the work with MWGPT to help improve the experiences our customers can leverage in the new Microsoft Azure AI Studio.
- Continuing to learn and refine their models, embracing improvements in the underlying AI technology to deliver better results to support engineers, which will also facilitate excellent customer support experiences.
The world is entering a new era of human and machine collaboration. It’s an exciting time in technology, and AI is going to help power monumental changes in how companies serve their customers.
“As we dive into the world of Modern Work GPT support indexes, refining our models and embracing the AI revolution, we are like a team of tech-savvy superheroes, ready to take customer support experiences that are out of this world to the next level,” says Shakil Ahmed, general manager of the Modern Work Support team. “The future is here, and we’re excited to be on this wild ride of innovation.”