{"id":969405,"date":"2023-09-20T14:13:33","date_gmt":"2023-09-20T21:13:33","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=969405"},"modified":"2025-01-13T13:07:45","modified_gmt":"2025-01-13T21:07:45","slug":"using-large-language-models-to-generate-validate-and-apply-user-intent-taxonomies","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/using-large-language-models-to-generate-validate-and-apply-user-intent-taxonomies\/","title":{"rendered":"Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies"},"content":{"rendered":"<p>Log data can reveal valuable information about how users interact with web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for new forms of web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics. Existing methods rely on manual or ML-based labeling, which are either expensive or inflexible for large and changing datasets. We propose a novel solution using large language models (LLMs), which can generate rich and relevant concepts, descriptions, and examples for user intents. However, using LLMs to generate a user intent taxonomy and apply it to do log analysis can be problematic for two main reasons: such a taxonomy is not externally validated, and there may be an undesirable feedback loop. To overcome these issues, we propose a new methodology with human experts and assessors to verify the quality of the LLM-generated taxonomy. We also present an end-to-end pipeline that uses an LLM with human-in-the-loop to produce, refine, and use labels for user intent analysis in log data. Our method offers a scalable and adaptable way to analyze user intents in web-scale log data with minimal human effort. We demonstrate its effectiveness by uncovering new insights into user intents from search and chat logs from Bing.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Log data can reveal valuable information about how users interact with web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for new forms of web search such as AI-driven chat. To understand user intents from log data, we need a way to [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"MSR-TR-2023-32","msr_organization":"Microsoft","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2023-9-20","msr_highlight_text":"","msr_notes":"This article is distributed under Creative Commons Attribution- NonCommercial- NoDerivatives License 4.0 (CC BY-NC-ND). https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/legalcode.","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13555],"msr-publication-type":[193718],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246694,248503,264837,248353],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-969405","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-search-information-retrieval","msr-locale-en_us","msr-field-of-study-artificial-intelligence","msr-field-of-study-information-retrieval","msr-field-of-study-intent-detection","msr-field-of-study-language-model"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2023-9-20","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2023-32","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"Microsoft","msr_how_published":"","msr_notes":"This article is distributed under Creative Commons Attribution- NonCommercial- NoDerivatives License 4.0 (CC BY-NC-ND). https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/legalcode.","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/SIGIR_2024___LLM_for_Taxonomy_v2.pdf","id":"1002840","title":"sigir_2024___llm_for_taxonomy_v2","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2309.13063","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":1002840,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/01\/SIGIR_2024___LLM_for_Taxonomy_v2.pdf"},{"id":969432,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/LLMs_for_Intent_Taxonomies-650b6ae9c10b5.pdf"},{"id":969429,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/LLMs_for_Intent_Taxonomies-650b6831d8dd1.pdf"},{"id":969420,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/LLMs_for_Intent_Taxonomies-650b650c6b1a4.pdf"},{"id":969417,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/LLMs_for_Intent_Taxonomies-650b63fdd47ca.pdf"},{"id":969414,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/LLMs_for_Intent_Taxonomies.pdf"},{"id":969411,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/WSDM_2024___LLM_for_Taxonomy-5.pdf"}],"msr-author-ordering":[{"type":"text","value":"Chirag Shah","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ryen W. White","user_id":33481,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ryen W. White"},{"type":"text","value":"Reid Andersen","user_id":0,"rest_url":false},{"type":"text","value":"Georg Buscher","user_id":0,"rest_url":false},{"type":"edited_text","value":"Scott Counts","user_id":31471,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Scott Counts"},{"type":"text","value":"Sarathi Das","user_id":0,"rest_url":false},{"type":"text","value":"Ali Montazer","user_id":0,"rest_url":false},{"type":"text","value":"Sathish Manivannan","user_id":0,"rest_url":false},{"type":"edited_text","value":"Jennifer Neville","user_id":40946,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jennifer Neville"},{"type":"text","value":"Xiaochuan Ni","user_id":0,"rest_url":false},{"type":"text","value":"Nagu Rangan","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Tara Safavi","user_id":42021,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Tara Safavi"},{"type":"user_nicename","value":"Siddharth Suri","user_id":33766,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Siddharth Suri"},{"type":"user_nicename","value":"Mengting Wan","user_id":39510,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Mengting Wan"},{"type":"edited_text","value":"Longqi Yang","user_id":38790,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Longqi Yang"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144672],"msr_project":[1119417,978909],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"techreport","related_content":{"projects":[{"ID":1119417,"post_title":"Semantic Telemetry","post_name":"semantic-telemetry","post_type":"msr-project","post_date":"2025-02-28 16:43:18","post_modified":"2025-03-03 14:38:14","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/semantic-telemetry\/","post_excerpt":"AI has transformed how we interact with technology, moving from traditional graphical interfaces to language-based, collaborative systems. To measure these new human-AI interactions, we developed Semantic Telemetry, which analyzes natural language to classify and quantify user behaviors. This method captures the context, cognition, and course of action behind user tasks, offering insights into their collaboration with AI. Our project aims to build a scalable service for data processing, with a focus on understanding both the&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1119417"}]}},{"ID":978909,"post_title":"AI Chat Log Research","post_name":"ai-chat-log-research","post_type":"msr-project","post_date":"2023-10-24 09:02:02","post_modified":"2024-06-07 11:56:34","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/ai-chat-log-research\/","post_excerpt":"AI Chat Log Research is a v-team collaboration among E+D Office of Applied Research, Microsoft Research, Bing Metrics and Analytics, and Turing to address practical challenges arising from the analysis of AI Chat logs. Large Language Model-based AI is transforming how users interact with assistive systems. Logs of user interactions from new chat and copilot AI systems provide more extensive signals of user satisfaction, success, and enjoyment than conventional search and recommendation logs due to&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/978909"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/969405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/969405\/revisions"}],"predecessor-version":[{"id":1002843,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/969405\/revisions\/1002843"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=969405"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=969405"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=969405"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=969405"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=969405"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=969405"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=969405"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=969405"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=969405"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=969405"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=969405"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=969405"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=969405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}