{"id":1134179,"date":"2025-03-19T09:00:00","date_gmt":"2025-03-19T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1134179"},"modified":"2025-08-05T07:19:36","modified_gmt":"2025-08-05T14:19:36","slug":"claimify-extracting-high-quality-claims-from-language-model-outputs","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/claimify-extracting-high-quality-claims-from-language-model-outputs\/","title":{"rendered":"Claimify: Extracting high-quality claims from language model outputs"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1.jpg\" alt=\"Gradient background transitioning from blue to pink with two white icons. The left icon depicts a network or molecule structure with interconnected nodes, and the right icon shows a laptop and the outline of a person.\" class=\"wp-image-1134342\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/video\/claimify-extracting-high-quality-claims-from-language-model-outputs\/\">Watch Dasha&#8217;s Claimify Explainer<\/a><\/div>\n<\/div>\n\n\n\n<p class=\"has-text-align-center\"><em>This research was accepted by the 63rd annual meeting of the Association for Computational Linguistics (ACL 2025), the premier conference on natural language processing.<\/em><\/p>\n\n\n\n<p>While large language models (LLMs) are capable of synthesizing vast amounts of information, they sometimes produce inaccurate or unsubstantiated content. To mitigate this risk, tools like Azure AI\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/content-safety\/concepts\/groundedness\" target=\"_blank\" rel=\"noopener noreferrer\">Groundedness Detection<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> can be used to verify LLM outputs.\u202f<\/p>\n\n\n\n<p>A common strategy for fact-checking LLM-generated texts \u2013 especially complex, highly detailed outputs \u2013 is <strong>claim extraction<\/strong>: instead of evaluating the entire text at once, it\u2019s broken down into simple factual statements that can be verified independently. However, the effectiveness of this approach depends on the quality of the extracted claims: if they are inaccurate or incomplete, the fact-checking results may be compromised.&nbsp;<\/p>\n\n\n\n<p>Our paper \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/towards-effective-extraction-and-evaluation-of-factual-claims\/\">Towards Effective Extraction and Evaluation of Factual Claims<\/a>\u201d proposes a framework for evaluating claim extraction and introduces <strong>Claimify<\/strong>, a novel LLM-based method that outperforms prior solutions. In this blog post, we discuss common claim extraction issues and explore Claimify\u2019s approach.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"key-issues-in-claim-extraction\">Key issues in claim extraction<\/h2>\n\n\n\n<p>As a case study, we prompted OpenAI\u2019s GPT-4o model to \u201c<em>provide an overview of challenges in emerging markets<\/em>\u201d based on a collection of news articles. Here is its output:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong># Emerging market challenges <\/strong><br>Emerging markets face a myriad of challenges that span economic, social, and environmental dimensions. These challenges are often interconnected, exacerbating the difficulties faced by these regions. <br><br><strong>## Economic instability and inflation <\/strong><br>Several emerging markets are grappling with severe economic instability. For instance, Argentina&#8217;s rampant inflation, with monthly rates reaching as high as 25.5%, has made many goods unobtainable and plunged the value of the currency, causing severe economic hardship. Some experts estimate that the annual inflation rate could potentially double to 300%, while others predict even higher rates. <br><br><strong>## Food insecurity and agricultural challenges <\/strong><br>Food insecurity is another critical issue in emerging markets. Nigeria, for example, is striving to become self-sufficient in wheat production but is hindered by climate change and violence, exacerbated by high grain prices due to the suspension of the Black Sea Grain Initiative. Climate change has played a pivotal role in creating food insecurity and economic instability in farming-dependent economies, such as Zambia and Mozambique. <br><br><strong>## Natural disasters and public health emergencies <\/strong><br>Public health crises, as demonstrated by the experience of Derna, Libya, remain a significant concern in emerging markets. Catastrophic flooding caused by the collapse of two dams has led to significant destruction, complicating access to healthcare. The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management. Countries like Afghanistan and Sudan have experienced similar challenges to those of Libya. <br><br><strong>## Conclusion <\/strong><br>Addressing emerging market challenges will require comprehensive strategies that consider economic stability, food security, and public health. International cooperation and targeted interventions are crucial in fostering sustainable development in emerging markets.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>We then applied a simple prompt<a id=\"_ftnref1\" href=\"#_ftn1\">[1]<\/a> that extracted the following claims from the answer:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>1. Emerging markets face economic challenges. <br>2. Emerging markets face social challenges. <br>3. Emerging markets face environmental challenges. <br>4. Argentina experiences severe economic instability. <br>5. Argentina has a monthly inflation rate as high as 25.5%. <br>6. Argentina&#8217;s inflation has made many goods unobtainable. <br>7. Argentina&#8217;s currency value has plunged. <br>8. Some experts estimate Argentina&#8217;s annual inflation rate could reach 300%. <br>9. Food insecurity is a critical issue in emerging markets. <br>10. Nigeria is striving to become self-sufficient in wheat production. <br>11. Nigeria&#8217;s wheat production is hindered by climate change. <br>12. Nigeria&#8217;s wheat production is hindered by violence. <br>13. High grain prices affect Nigeria due to the suspension of the Black Sea Grain Initiative. <br>14. Climate change creates food insecurity in Zambia. <br>15. Climate change creates food insecurity in Mozambique. <br>16. Climate change causes economic instability in farming-dependent economies. <br>17. Public health crises are a concern in emerging markets. <br>18. Derna, Libya, experienced catastrophic flooding. <br>19. The collapse of two dams caused flooding in Derna, Libya. <br>20. Flooding in Derna, Libya, complicated access to healthcare. <br>21. The U.N. found contaminated water in Derna, Libya. <br>22. Contaminated water in Derna, Libya, caused many residents to fall ill. <br>23. Afghanistan has experienced challenges similar to those in Libya. <br>24. Sudan has experienced challenges similar to those in Libya. <br>25. Addressing emerging market challenges requires comprehensive strategies. <br>26. Comprehensive strategies should consider economic stability. <br>27. Comprehensive strategies should consider food security. <br>28. Comprehensive strategies should consider public health. <br>29. International cooperation is crucial for sustainable development in emerging markets. <br>30. Targeted interventions are crucial for sustainable development in emerging markets.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>These claims appear to reflect the answer, but a closer look reveals four critical issues:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>#<\/th><th class=\"has-text-align-left\" data-align=\"left\">Issue<\/th><th>Example<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td class=\"has-text-align-left\" data-align=\"left\">Some claims are not factual statements that can be verified as true or false.<\/td><td><strong>Excerpt:<\/strong> \u201c<em>Addressing emerging market challenges will require comprehensive strategies that consider economic stability, food security, and public health.<\/em>\u201d <br><br><strong>Claims:<\/strong> <ul><li>Addressing emerging market challenges requires comprehensive strategies.<\/li> <li>Comprehensive strategies should consider economic stability.<\/li> <li>Comprehensive strategies should consider food security.<\/li> <li>Comprehensive strategies should consider public health.<\/li><\/ul><strong>Explanation: <\/strong>These claims are not verifiable because they are opinions.<\/td><\/tr><tr><td>2<\/td><td class=\"has-text-align-left\" data-align=\"left\">Some claims are missing or incomplete.<\/td><td><strong>Excerpt:<\/strong> \u201c<em>Argentina&#8217;s rampant inflation, with monthly rates reaching as high as 25.5%, has made many goods unobtainable and plunged the value of the currency, <u>causing severe economic hardship<\/u>. Some experts estimate that the annual inflation rate could potentially double to 300%, while <u>others predict even higher rates<\/u>.<\/em>\u201d <br><br><strong>Claims:<\/strong> <ul><li>Argentina has a monthly inflation rate as high as 25.5%.<\/li> <li>Argentina&#8217;s inflation has made many goods unobtainable.<\/li> <li>Argentina&#8217;s currency value has plunged.<\/li> <li>Some experts estimate Argentina\u2019s annual inflation rate could reach 300%.<\/li><\/ul> <strong>Explanation: <\/strong>The phrases \u201c<em>causing severe economic hardship<\/em>\u201d and \u201c<em>others predict even higher rates<\/em>\u201d are not reflected in any of the claims. The third claim also omits the fact that inflation caused the currency depreciation.<\/td><\/tr><tr><td>3<\/td><td class=\"has-text-align-left\" data-align=\"left\">Some claims are inaccurate.<\/td><td><strong>Excerpt: <\/strong>\u201c<em>The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management<\/em>.\u201d<br><br><strong>Claims:<\/strong> <ul><li>The U.N. found contaminated water in Derna, Libya.<\/li> <li>Contaminated water in Derna, Libya, caused many residents to fall ill.<\/li><\/ul> <strong>Explanation: <\/strong>The first claim is inaccurate because the U.N. found the link between contaminated water and illness, not the contaminated water itself. The second claim also misrepresents the sentence since it shifts the meaning from a viewpoint of a specific entity (the U.N.) to a general assertion about the effects of contaminated water in Derna, Libya.<\/td><\/tr><tr><td>4<\/td><td class=\"has-text-align-left\" data-align=\"left\">Some claims cannot be understood without additional context.<\/td><td><strong>Excerpt: <\/strong>\u201c<em>Countries like Afghanistan and Sudan have experienced similar challenges to those of Libya.<\/em>\u201d<br><br><strong>Claims:<\/strong> <ul><li>Afghanistan has experienced challenges similar to those in Libya.<\/li> <li>Sudan has experienced challenges similar to those in Libya.<\/li><\/ul> <strong>Explanation: <\/strong>These claims cannot be understood on their own because \u201c<em>those<\/em>\u201d is not defined.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"introducing-claimify\">Introducing Claimify<\/h2>\n\n\n\n<p>The case study highlights that claim extraction is surprisingly error-prone. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/towards-effective-extraction-and-evaluation-of-factual-claims\/\">Our paper<\/a> demonstrates that the issues identified above are common across LLM-based claim extraction methods. To minimize these errors, we created a system called Claimify<a id=\"_ftnref2\" href=\"#_ftn2\">[2]<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"core-principles\">Core principles<\/h3>\n\n\n\n<p>Claimify is an LLM-based claim extraction system built on the following principles:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>#<\/th><th>Principle<\/th><th>Example<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>The claims should capture all verifiable content in the source text and exclude unverifiable content.<\/td><td>In the sentence \u201c<em>The partnership between John and Jane illustrates the importance of collaboration,<\/em>\u201d the only verifiable content is the existence of a partnership between John and Jane. The rest is subjective interpretation.<\/td><\/tr><tr><td>2<\/td><td>Each claim should be entailed (i.e., fully supported) by the source text.<\/td><td>Consider the sentence \u201c<em>Governments are curtailing emissions from cars and trucks, which are the largest source of greenhouse gases from transportation<\/em>.\u201d The following claims are incorrect: <br><br><ul><li>Cars are the largest source of greenhouse gases from transportation.<\/li> <li>Trucks are the largest source of greenhouse gases from transportation.<\/li><\/ul>The sentence attributes the highest emissions to cars and trucks collectively, not individually.<\/td><\/tr><tr><td>3<\/td><td>Each claim should be understandable on its own, without additional context.<\/td><td>The claim \u201c<em>They will update the policy next year<\/em>\u201d is not understandable on its own because it\u2019s unclear what \u201c<em>They<\/em>,\u201d \u201c<em>the policy<\/em>,\u201d and \u201c<em>next year<\/em>\u201d refer to.<\/td><\/tr><tr><td>4<\/td><td>Each claim should minimize the risk of excluding critical context.<\/td><td>Suppose the<em> claim \u201cThe World Trade Organization has supported trade barriers\u201d <\/em>was extracted from the sentence<em> \u201cAn exception to the World Trade Organization\u2019s open-market philosophy is its history of supporting trade barriers when member countries have failed to comply with their obligations.\u201d<\/em> A fact-checking system would likely classify the claim as false, since there is extensive evidence that the WTO aims to reduce trade barriers. However, if the claim had specified that the WTO has supported trade barriers \u201c<em>when member countries have failed to comply with their obligations,<\/em>\u201d it would likely have been classified as true. This example demonstrates that missing context can distort the fact-checking verdict.<\/td><\/tr><tr><td>5<\/td><td>The system should flag cases where ambiguity cannot be resolved.<\/td><td>The sentence \u201c<em>AI has advanced renewable energy and sustainable agriculture at Company A and Company B<\/em>\u201d has two mutually exclusive interpretations: <br><br><ul><li>AI has advanced renewable energy and sustainable agriculture at both Company A and Company B.<\/li> <li>AI has advanced renewable energy at Company A and sustainable agriculture at Company B.<\/li><\/ul>If the context does not clearly indicate that one of these interpretations is correct, the system should flag the ambiguity instead of picking one interpretation arbitrarily.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"implementation\">Implementation<\/h3>\n\n\n\n<p>Claimify accepts a question-answer pair as input and performs claim extraction in four stages, illustrated in Figure 1:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>#<\/th><th>Stage<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>Sentence splitting and context creation<\/td><td>The answer is split into sentences, with \u201ccontext\u201d \u2013 a configurable combination of surrounding sentences and metadata (e.g., the header hierarchy in a Markdown-style answer)&nbsp;\u2013 created for each sentence.<\/td><\/tr><tr><td>2<\/td><td>Selection<\/td><td>An LLM identifies sentences that do not contain verifiable content. These sentences are labeled \u201cNo verifiable claims\u201d and excluded from subsequent stages. When sentences contain verifiable and unverifiable components, the LLM rewrites the sentence, retaining only the verifiable components.<\/td><\/tr><tr><td>3<\/td><td>Disambiguation<\/td><td>For sentences that passed the Selection stage, an LLM detects ambiguity and determines if it can be resolved using the context. If all ambiguity is resolvable, the LLM returns a disambiguated version of the sentence. Otherwise, the sentence is labeled \u201cCannot be disambiguated\u201d and excluded from the Decomposition stage.<\/td><\/tr><tr><td>4<\/td><td>Decomposition<\/td><td>For sentences that are unambiguous or were disambiguated, an LLM creates standalone claims that preserve critical context. If no claims are extracted, the sentence is labeled \u201cNo verifiable claims.\u201d<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1915\" height=\"453\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/claimify_figure_blog.png\" alt=\"A flowchart outlining Claimify\u2019s stages for extracting claims from a question-answer pair. The process begins by splitting the answer into sentences and creating context. Next, the Selection stage asks if a sentence contains any verifiable content. If no, the sentence is labeled \"No verifiable claims\" and excluded from subsequent stages; if yes, it proceeds to the Disambiguation stage. The Disambiguation stage asks if the sentence contains any ambiguity that cannot be resolved. If yes, the sentence is labeled \"Cannot be disambiguated\" and excluded from the final stage; if no, it proceeds to the Decomposition stage. The Decomposition stage attempts to decompose the sentence into claims. If it is decomposed into at least one claim, the sentence is labeled \"Extracted claims\"; otherwise, the sentence is labeled \"No verifiable claims.\" The Selection, Disambiguation, and Decomposition stages apply to each sentence individually. \" class=\"wp-image-1134259\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/claimify_figure_blog.png 1915w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/claimify_figure_blog-300x71.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/claimify_figure_blog-1024x242.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/claimify_figure_blog-768x182.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/claimify_figure_blog-1536x363.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/claimify_figure_blog-240x57.png 240w\" sizes=\"auto, (max-width: 1915px) 100vw, 1915px\" \/><figcaption class=\"wp-element-caption\">Figure 1: Overview of Claimify\u2019s stages<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"results\">Results<\/h2>\n\n\n\n<p>In <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/towards-effective-extraction-and-evaluation-of-factual-claims\/\" target=\"_blank\" rel=\"noreferrer noopener\">our paper<\/a>, we demonstrate that Claimify outperforms existing LLM-based methods<a href=\"#_ftn3\">[3]<\/a>. Specifically, we show that: (1) 99% of claims extracted by Claimify are entailed by their source sentence, (2) Claimify strikes the best balance between including verifiable content and excluding unverifiable content, and (3) Claimify is least likely to omit context critical to the fact-checking verdict.<\/p>\n\n\n\n<p>For the above case study on challenges in emerging markets, here are Claimify\u2019s outputs, with source sentences preceded by a letter and claims numbered<a href=\"#_ftn4\" id=\"_ftnref4\">[4]<\/a>:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>A. Several emerging markets are grappling with severe economic instability. <br><\/strong>1. Several emerging markets are grappling with severe economic instability. <br><br><strong>B. For instance, Argentina&#8217;s rampant inflation, with monthly rates reaching as high as 25.5%, has made many goods unobtainable and plunged the value of the currency, causing severe economic hardship. <br><\/strong>1. Argentina has rampant inflation. <br>2. The monthly inflation rates in Argentina have reached as high as 25.5%. <br>3. Inflation has made many goods unobtainable in Argentina. <br>4. Inflation has plunged the value of the currency in Argentina. <br>5. Inflation has caused severe economic hardship in Argentina. <br><br><strong>C. Some experts estimate that the annual inflation rate could potentially double to 300%, while others predict even higher rates. <br><\/strong>1. Some experts estimate that Argentina&#8217;s annual inflation rate could double to 300% in the future. <br>2. Some experts predict that Argentina&#8217;s annual inflation rate could be higher than 300% in the future. <br><br><strong>D. Nigeria, for example, is striving to become self-sufficient in wheat production but is hindered by climate change and violence, exacerbated by high grain prices due to the suspension of the Black Sea Grain Initiative. <br><\/strong>1. Nigeria is striving to become self-sufficient in wheat production. <br>2. Nigeria is hindered by climate change in becoming self-sufficient in wheat production. <br>3. Nigeria is hindered by violence in becoming self-sufficient in wheat production. <br>4. High grain prices exacerbate the hindrance to Nigeria&#8217;s efforts to become self-sufficient in wheat production. <br>5. The suspension of the Black Sea Grain Initiative is a reason for high grain prices. <br><br><strong>E. Climate change has played a pivotal role in creating food insecurity and economic instability in farming-dependent economies, such as Zambia and Mozambique. <br><\/strong>1. Climate change has played a role in creating food insecurity in farming-dependent economies. <br>2. Zambia is a farming-dependent economy where climate change has played a role in creating food insecurity. <br>3. Mozambique is a farming-dependent economy where climate change has played a role in creating food insecurity. <br>4. Climate change has played a role in creating economic instability in farming-dependent economies. <br>5. Zambia is a farming-dependent economy where climate change has played a role in creating economic instability.<br>6. Mozambique is a farming-dependent economy where climate change has played a role in creating economic instability.<br><br><strong>F. Public health crises, as demonstrated by the experience of Derna, Libya, remain a significant concern in emerging markets.<br><\/strong>1. Public health crises are a concern in emerging markets.<br>2. Derna, Libya, is an example of a public health crisis in emerging markets.<br><br><strong>G. Catastrophic flooding caused by the collapse of two dams has led to significant destruction, complicating access to healthcare.<br><\/strong>1. There was catastrophic flooding in Derna, Libya.<br>2. The flooding in Derna, Libya, was caused by the collapse of two dams.<br>3. The flooding in Derna, Libya, has led to significant destruction.<br>4. The flooding in Derna, Libya, has complicated access to healthcare.<br><br><strong>H. Countries like Afghanistan and Sudan have experienced similar challenges to those of Libya.<br><\/strong>1. Afghanistan has experienced challenges related to public health crises.<br>2. Afghanistan has experienced challenges related to catastrophic flooding.<br>3. Afghanistan has experienced challenges related to contaminated water.<br>4. Sudan has experienced challenges related to public health crises.<br>5. Sudan has experienced challenges related to catastrophic flooding.<br>6. Sudan has experienced challenges related to contaminated water.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Note that the baseline prompt extracted several claims from the sentence \u201c<em>The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management<\/em>,\u201d but it ignored the phrase <em>\u201chighlighting the need for improved water management<\/em>.\u201d&nbsp;It also failed to capture that the contaminated water resulted from flooding, as implied by \u201c<em>resulting<\/em>\u201d in the original sentence.<\/p>\n\n\n\n<p>Claimify took a different approach. First, it found two instances of ambiguity \u2013 \u201c<em>resulting contaminated water<\/em>\u201d and \u201c<em>many resident<\/em>s\u201d&nbsp;\u2013 that it determined could be resolved using the context. Here\u2019s an excerpt from its reasoning: \u201c&#8230;<em>the context specifies that the contaminated water is a result of the catastrophic flooding in Derna, Libya, and the residents are those of Derna, Libya.<\/em>\u201d<\/p>\n\n\n\n<p>However, it also found an instance of ambiguity \u2013 \u201c<em>highlighting the need for improved water management\u201d<\/em> \u2013 where it concluded that the context does not definitively support a single interpretation: \u201c<em>The sentence could be interpreted as: (1) The U.N. found that the contaminated water caused illness and also highlighted the need for improved water management, (2) The U.N. only found that the contaminated water caused illness, while the need for improved water management is an implication or conclusion drawn by the writer. Readers \u2026 would likely fail to reach consensus about the correct interpretation of this ambiguity.<\/em>\u201d As a result, Claimify labeled the sentence \u201cCannot be disambiguated\u201d at the Disambiguation stage and did not proceed to the Decomposition stage.&nbsp;<\/p>\n\n\n\n<p>To the best of our knowledge, Claimify is the first claim extraction system that identifies when the source text has multiple possible interpretations and extracts claims only when there is high confidence in the correct interpretation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"next-steps\">Next steps<\/h2>\n\n\n\n<p>We\u2019re currently working on new methods for evaluating LLM-generated texts. We anticipate that the high-quality claims extracted by Claimify will help not only in verifying the veracity of LLM outputs, but also in assessing their overall quality \u2013 especially when gold-standard references are difficult to create (e.g., long-form texts where people may disagree on what defines \u201cgood\u201d content). For example, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">we recently used Claimify<\/a> to evaluate the comprehensiveness and diversity of answers generated by <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/graphrag\/\">GraphRAG<\/a>, showing that GraphRAG outperforms traditional Retrieval Augmented Generation (RAG) in these areas.<\/p>\n\n\n\n<p>For an in-depth discussion of Claimify and our evaluation framework, please see our paper \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/towards-effective-extraction-and-evaluation-of-factual-claims\/\">Towards Effective Extraction and Evaluation of Factual Claims<\/a>.\u201d<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><a id=\"_ftn1\" href=\"#_ftnref1\">[1]<\/a> We used the \u201cproposition chunking\u201d prompt from<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/NirDiamant\/RAG_Techniques\/blob\/main\/all_rag_techniques\/proposition_chunking.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\"> NirDiamant&#8217;s RAG Techniques repository<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. We generated multiple responses using GPT-4o, then picked the response that was most representative of the samples.<\/p>\n\n\n\n<p><a id=\"_ftn2\" href=\"#_ftnref2\">[2]<\/a> Claimify is currently used for research purposes only and is not available commercially.<\/p>\n\n\n\n<p><a id=\"_ftn3\" href=\"#_ftnref3\">[3]<\/a> We benchmarked Claimify against <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aclanthology.org\/2024.findings-emnlp.552\/\" target=\"_blank\" rel=\"noopener noreferrer\">VeriScore<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2412.13175\" target=\"_blank\" rel=\"noopener noreferrer\">DnD<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2403.18802\" target=\"_blank\" rel=\"noopener noreferrer\">SAFE<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aclanthology.org\/2024.acl-long.104\/\" target=\"_blank\" rel=\"noopener noreferrer\">AFaCTA<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aclanthology.org\/2024.findings-emnlp.830\/\" target=\"_blank\" rel=\"noopener noreferrer\">Factcheck-GPT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p><a href=\"#_ftnref4\" id=\"_ftn4\">[4]<\/a> The outputs were generated using GPT-4o. Sentences not shown were either labeled \u201cNo verifiable claims\u201d or \u201cCannot be disambiguated.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Claimify, created by Microsoft Research, is a novel LLM-based claim-extraction method that outperforms prior solutions to produce more accurate, comprehensive, and substantiated claims from LLM outputs.<\/p>\n","protected":false},"author":43518,"featured_media":1134342,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Dasha Metropolitansky","user_id":"43815"}],"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1134179","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565,1161007],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[901101],"related-projects":[1027041],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Dasha Metropolitansky","user_id":43815,"display_name":"Dasha Metropolitansky","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dasham\/?lang=ja\" aria-label=\"Dasha Metropolitansky\u30d7\u30ed\u30d5\u30a3\u30fc\u30eb\u30da\u30fc\u30b8\u3092\u3054\u89a7\u304f\u3060\u3055\u3044\">Dasha Metropolitansky<\/a>","is_active":false,"last_first":"Metropolitansky, Dasha","people_section":0,"alias":"dasham"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"Gradient background transitioning from blue to pink with two white icons. The left icon depicts a network or molecule structure with interconnected nodes, and the right icon shows a laptop.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/03\/Claimify-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dasham\/\" title=\"Go to researcher profile for Dasha Metropolitansky\" aria-label=\"Go to researcher profile for Dasha Metropolitansky\" data-bi-type=\"byline author\" data-bi-cN=\"Dasha Metropolitansky\">Dasha Metropolitansky<\/a>","formattedDate":"March 19, 2025","formattedExcerpt":"Claimify, created by Microsoft Research, is a novel LLM-based claim-extraction method that outperforms prior solutions to produce more accurate, comprehensive, and substantiated claims from LLM outputs.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1134179","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43518"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1134179"}],"version-history":[{"count":43,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1134179\/revisions"}],"predecessor-version":[{"id":1141541,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1134179\/revisions\/1141541"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1134342"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1134179"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1134179"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1134179"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1134179"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1134179"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1134179"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1134179"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1134179"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1134179"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1134179"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1134179"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}