{"id":978333,"date":"2023-10-23T01:48:37","date_gmt":"2023-10-23T08:48:37","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=978333"},"modified":"2024-08-21T23:08:14","modified_gmt":"2024-08-22T06:08:14","slug":"llmlingua","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/llmlingua\/","title":{"rendered":"LLMLingua Series"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background-catalina-blue card-background--inset-right\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1792\" height=\"1024\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/LLMLingua_background.png\" class=\"attachment-full size-full\" alt=\"background of LLMLingua\" style=\"object-position: 33% 23%\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/LLMLingua_background.png 1792w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/LLMLingua_background-300x171.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/LLMLingua_background-1024x585.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/LLMLingua_background-768x439.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/LLMLingua_background-1536x878.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/LLMLingua_background-240x137.png 240w\" sizes=\"auto, (max-width: 1792px) 100vw, 1792px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"llmlingua\">LLMLingua<\/h1>\n\n\n\n<p>Effectively Deliver Information to LLMs via&nbsp;<strong>Prompt Compression<\/strong><\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>LLMLingua<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"identify-and-remove-non-essential-tokens-in-prompts-using-perplexity-from-a-slm\">Identify and remove non-essential tokens in prompts using perplexity from a SLM<\/h3>\n\n\n\n<p><strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua.html\">Read More<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>LongLLMLingua<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"enhance-long-context-information-via-query-aware-compression-and-reorganization\">Enhance long-context information via query-aware compression and reorganization<\/h3>\n\n\n\n<p><strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/longllmlingua.html\">Read More<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>LLMLingua-2<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"utilize-data-distillation-to-learn-compression-targets-for-efficient-and-faithful-task-agnostic-compression\">Utilize data distillation to learn compression targets for efficient and faithful task-agnostic compression<\/h3>\n\n\n\n<p><strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua2.html\">Read More<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><\/p>\n<\/div>\n<\/div>\n\n\n\n<p>Large language models (LLMs) have demonstrated remarkable capabilities and have been applied across various fields. Advancements in technologies such as Chain-of-Thought (CoT), In-Context Learning (ICL), and Retrieval-Augmented Generation (RAG) have led to increasingly lengthy prompts for LLMs, sometimes exceeding tens of thousands of tokens. Longer prompts, however, can result in 1) increased API response latency, 2) exceeded context window limits, 3) loss of contextual information, 4) expensive API bills, and 5) performance issues such as &#8220;lost in the middle.&#8221;<\/p>\n\n\n\n<p>Inspired by the concept of &#8220;LLMs as Compressors,&#8221; we designed a series of works that try to build a language for LLMs via prompt compression. This approach accelerates model inference, reduces costs, and improves downstream performance while revealing LLM context utilization and intelligence patterns. Our work achieved a <em>20x compression ratio<\/em> with minimal performance loss(<strong>LLMLingua<\/strong>), and <em>a 17.1% performance improvement with 4x compression<\/em> (<strong>LongLLMLingua<\/strong>). <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2310.05736\"><strong>LLMLingua-2<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.<\/p>\n\n\n\n<p>This page is for&nbsp;<strong>research demonstration purposes<\/strong>&nbsp;only. <\/p>\n\n\n\n<p>If you are interested in our ideas, please feel free to <strong>use LLMLingua<\/strong> and communicate with us.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"user-content-news\"><strong>News<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\ud83e\udd9a<\/strong>&nbsp;&nbsp;We&#8217;re excited to announce the release of&nbsp;<strong>LLMLingua-2<\/strong>, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2403.12968\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>, visit the&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua2.html\">project page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>, and explore our&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/huggingface.co\/spaces\/microsoft\/LLMLingua-2\">demo<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>.<\/li>\n\n\n\n<li><strong>\ud83d\udc7e<\/strong>&nbsp;&nbsp;LLMLingua has been integrated into&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/langchain-ai\/langchain\/blob\/master\/docs\/docs\/integrations\/retrievers\/llmlingua.ipynb\">LangChain<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>&nbsp;and&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llama_index\/blob\/main\/docs\/examples\/node_postprocessor\/LongLLMLingua.ipynb\">LlamaIndex<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>, two widely-used RAG frameworks.<\/li>\n\n\n\n<li><strong>\ud83e\udd33<\/strong>&nbsp;&nbsp;Talk slides are available in&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/drive.google.com\/file\/d\/1fzK3wOvy2boF7XzaYuq2bQ3jFeP1WMk3\/view?usp=sharing\">AI Time Jan, 24<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>.<\/li>\n\n\n\n<li><strong>\ud83d\udda5<\/strong>&nbsp;&nbsp;EMNLP&#8217;23 slides are available in&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/drive.google.com\/file\/d\/1GxQLAEN8bBB2yiEdQdW4UKoJzZc0es9t\/view\">Session 5<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>&nbsp;and&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/drive.google.com\/file\/d\/1LJBUfJrKxbpdkwo13SgPOqugk-UjLVIF\/view\">BoF-6<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>.<\/li>\n\n\n\n<li><strong>\ud83d\udcda<\/strong>&nbsp;&nbsp;Check out our new&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/medium.com\/@iofu728\/longllmlingua-bye-bye-to-middle-loss-and-save-on-your-rag-costs-via-prompt-compression-54b559b9ddf7\">blog post<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>&nbsp;discussing RAG benefits and cost savings through prompt compression. See the script example&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\/Retrieval.ipynb\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>.<\/li>\n\n\n\n<li><strong>\ud83d\udc68\u200d\ud83e\uddaf<\/strong>&nbsp;&nbsp;Explore our&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\">&#8216;.\/examples&#8217;<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>&nbsp;directory for practical applications, including&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\/LLMLingua2.ipynb\">LLMLingua-2<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>,&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\/RAG.ipynb\">RAG<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>,&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\/OnlineMeeting.ipynb\">Online Meeting<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>,&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\/CoT.ipynb\">CoT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>,&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\/Code.ipynb\">Code<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>, and&nbsp;<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/LLMLingua\/blob\/main\/examples\/RAGLlamaIndex.ipynb\">RAG using LlamaIndex<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-video aligncenter\"><video autoplay controls preload=\"auto\" src=\"https:\/\/github.com\/microsoft\/LLMLingua\/assets\/30883354\/eb0ea70d-6d4c-4aa7-8977-61f94bb87438\" style=\"width:600pt;text-align: center\"><\/video><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"insights\"><strong>Insights<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Natural language is redundant, amount of information varies.<\/li>\n\n\n\n<li>LLMs can understand compressed prompt.<\/li>\n\n\n\n<li>There is a trade-off between language completeness and compression ratio.&nbsp;<strong>(LLMLingua)<\/strong><\/li>\n\n\n\n<li>GPT-4 can recover all the key information from a compressed prompt-emergent ability.&nbsp;<strong>(LLMLingua)<\/strong><\/li>\n\n\n\n<li>The density and position of key information in a prompt affect the performance of downstream tasks.&nbsp;<strong>(LongLLMLingua)<\/strong><\/li>\n\n\n\n<li>GPT-4 can perform high quality, extractive prompt compression using carefully designed instruction and chunking.&nbsp;<strong>(LLMLingua-2)<\/strong><\/li>\n\n\n\n<li>Prompt compression can be formulated as a token classification problem and accomplished by a Bert size model.&nbsp;<strong>(LLMLingua-2)<\/strong><\/li>\n\n\n\n<li>Prompt compression can be formulated as a token classification problem and accomplished by a Bert size model.&nbsp;<strong>(LLMLingua-2)<\/strong><\/li>\n<\/ul>\n\n\n\n<p>For more details, please refer to the project pages,&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua.html\"><strong>LLMLingua<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/longllmlingua.html\"><strong>LongLLMLingua<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua2.html\"><strong>LLMLingua-2<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"786\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-1024x786.png\" alt=\"the motivation of LLMLingua\" class=\"wp-image-1016511\" style=\"width:750px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-1024x786.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-300x230.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-768x589.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-1536x1179.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-2048x1572.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/motivation-235x180.png 235w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"573\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-1024x573.png\" alt=\"LLMLingua onepage\" class=\"wp-image-1016517\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-1024x573.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-300x168.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-768x430.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-1536x859.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2048x1146.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-240x134.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-1024x574.png\" alt=\"LongLLMLingua onepage\" class=\"wp-image-1016526\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-1024x574.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-300x168.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-768x430.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-1536x860.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-2048x1147.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-240x134.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-640x360.png 640w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"578\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-1024x578.png\" alt=\"LLMLingua-2 onepage\" class=\"wp-image-1016529\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-1024x578.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-768x434.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-1536x867.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-2048x1157.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-240x136.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-640x360.png 640w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n\n\n<p>Paper: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2310.05736\">https:\/\/arxiv.org\/abs\/2310.05736<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>Demo: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/huggingface.co\/spaces\/microsoft\/LLMLingua\">https:\/\/huggingface.co\/spaces\/microsoft\/LLMLingua<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>Project Page: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua.html\">https:\/\/llmlingua.com\/llmlingua.html<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>To accelerate model inference and reduce cost, we introduce LLMLingua, which employs a well-trained small language model after alignment, such as GPT2-small or LLaMA-7B, detects unimportant tokens in the prompt and enables inference with the compressed prompt in black-box LLMs, achieving up to 20x compression with minimal performance loss. It&#8217;s worth noting that token-level compressed prompts are a format that is difficult for humans to understand but can be well interpreted by LLMs.<\/p>\n\n\n\n<p>To evaluate the effectiveness of compressed prompts, especially the unique capabilities of LLMs, we conducted experiments in four different scenarios, i.e., GSM8K, BBH, ShareGPT, and Arxiv-March23, which cover ICL, Reasoning, Summarization, and Conversation. The results show that our approach can effectively retain the original prompt&#8217;s capabilities, particularly in ICL and reasoning.&nbsp;<\/p>\n\n\n\n<p>Furthermore, we demonstrated the efficiency and practical acceleration of LLMLingua through latency tests and computational workload estimation.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"573\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-1024x573.png\" alt=\"LLMLingua onepage\" class=\"wp-image-1016517\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-1024x573.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-300x168.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-768x430.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-1536x859.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2048x1146.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-240x134.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"insights-1\"><strong>Insights<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Natural language is redundant, amount of information varies.<\/li>\n\n\n\n<li>LLMs can understand compressed prompt.<\/li>\n\n\n\n<li>There is a trade-off between language completeness and compression ratio.&nbsp;<strong>(LLMLingua)<\/strong><\/li>\n\n\n\n<li>GPT-4 can recover all the key information from a compressed prompt-emergent ability.&nbsp;<strong>(LLMLingua)<\/strong><a id=\"_msocom_1\"><\/a><\/li>\n<\/ul>\n\n\n\n<p>For more details, please refer to the paper&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2310.05736\"><strong>LLMLingua<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"why-llmlingua\"><strong>Why&nbsp;<em>LLMLingua<\/em>?<\/strong><\/h3>\n\n\n\n<p>Building on the intuition mentioned earlier, LLMLingua leverages small models&#8217; perplexity to measure the redundancy within a prompt. It has designed three modules, as illustrated above, to assign varying compression rates to different segments within the prompt. This approach takes into account the conditional probabilities between compressed tokens and other tokens to better establish a sensitive distribution. Moreover, to make small models more attuned to various black-box models, LLMLingua introduces an alignment mechanism that aligns small models more closely with the semantic distributions of LLMs.<\/p>\n\n\n\n<p>LLMLingua offers the following advantages:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>It can be directly used for black-box LLMs and helps save computation and financial costs, up to 20x.<\/strong><\/li>\n\n\n\n<li><strong>It is a highly robust method that requires no training of the LLMs and is applicable to different LLMs, such as GPT-4, GPT-3.5-Turbo, Claude, Mistral, etc.<\/strong><\/li>\n\n\n\n<li><strong><strong>After compression, it allows the model to support longer context inputs.<\/strong><\/strong><\/li>\n\n\n\n<li><strong>LLMLingua effectively retains the capabilities of LLMs, including reasoning, in-context learning, etc.<\/strong><\/li>\n\n\n\n<li><strong><strong>LLMLingua effectively retains the capabilities of LLMs, including reasoning, in-context learning, etc.<\/strong><\/strong><\/li>\n\n\n\n<li><strong>Prompts compressed by LLMLingua can be effectively decompressed by GPT-4, retaining vital information.<\/strong><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"bibtex-1\"><strong>BibTeX<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>@inproceedings{jiang-etal-2023-llmlingua,\n    title = \"{LLML}ingua: Compressing Prompts for Accelerated Inference of Large Language Models\",\n    author = \"Huiqiang Jiang and Qianhui Wu and Chin-Yew Lin and Yuqing Yang and Lili Qiu\",\n    editor = \"Bouamor, Houda  and\n      Pino, Juan  and\n      Bali, Kalika\",\n    booktitle = \"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing\",\n    month = dec,\n    year = \"2023\",\n    address = \"Singapore\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https:\/\/aclanthology.org\/2023.emnlp-main.825\",\n    doi = \"10.18653\/v1\/2023.emnlp-main.825\",\n    pages = \"13358--13376\",\n}<\/code><\/pre>\n\n\n\n\n\n<p>Paper: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2310.06839 \">https:\/\/arxiv.org\/abs\/2310.06839 <span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>Project Pape: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/longllmlingua.html\">https:\/\/llmlingua.com\/longllmlingua.html<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>In long context scenarios, large language models face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key information in the input prompt. Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs\u2019 perception of the key information to simultaneously address the three challenges. Our extensive evaluation across various long context scenarios demonstrates that LongLLMLingua not only enhances performance but also significantly reduces costs and latency. For instance, in the NaturalQuestions benchmark,&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2310.06839\"><strong>LongLLMLingua<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;boosts performance by up to&nbsp;<strong>21.4%<\/strong>&nbsp;with around&nbsp;<strong>4x fewer tokens<\/strong>&nbsp;in GPT-3.5-Turbo, leading to substantial cost savings. It achieves a&nbsp;<strong>94.0% cost reduction<\/strong>&nbsp;in the LooGLE benchmark. Moreover, when compressing prompts of about 10k tokens at ratios of 2x-6x, LongLLMLingua can accelerate end-to-end latency by 1.4x-2.6x.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-1024x574.png\" alt=\"LongLLMLingua onepage\" class=\"wp-image-1016526\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-1024x574.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-300x168.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-768x430.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-1536x860.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-2048x1147.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-240x134.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua-640x360.png 640w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"insights-2\"><strong>Insights<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Natural language is redundant, amount of information varies.<\/li>\n\n\n\n<li>LLMs can understand compressed prompt.<\/li>\n\n\n\n<li>There is a trade-off between language completeness and compression ratio.&nbsp;<strong>(LLMLingua)<\/strong><\/li>\n\n\n\n<li>GPT-4 can recover all the key information from a compressed prompt-emergent ability.&nbsp;<strong>(LLMLingua)<\/strong><\/li>\n\n\n\n<li>The density and position of key information in a prompt affect the performance of downstream tasks.&nbsp;<strong>(LongLLMLingua)<\/strong><\/li>\n<\/ul>\n\n\n\n<p>For more details, please refer to the paper&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2310.06839\"><strong>LongLLMLingua<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"why-longllmlingua\"><strong>Why&nbsp;<em>LongLLMLingua<\/em>?<\/strong><\/h3>\n\n\n\n<p>In long context scenarios, the distribution of key information is generally very sparse. Previous work has found that the density and placement of relevant information significantly impact the performance of Large Language Models (LLMs), even for highly powerful models like GPT-4-Turbo. LongLLMLingua capitalizes on these distribution characteristics by employing prompt compression and reorganization. This strategy schedules and utilizes the limited but powerful context windows for LLMs more efficiently, effectively mitigating the &#8220;Lost in the middle&#8221; issue. As illustrated in the figure above, LongLLMLingua can achieve up to a 21.4% improvement on the NQ Multi-document QA task while using only 1\/4 of the tokens.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"544\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_Motivation-1024x544.png\" alt=\"the motivation of LongLLMLingua\" class=\"wp-image-1016580\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_Motivation-1024x544.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_Motivation-300x159.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_Motivation-768x408.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_Motivation-1536x816.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_Motivation-240x128.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_Motivation.png 1914w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Our main contributions are&nbsp;five-fold<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We propose a&nbsp;<strong>question-aware coarse-to-fine compression method<\/strong>&nbsp;to improve the key information density in the prompt.<\/li>\n\n\n\n<li>We introduce a&nbsp;<strong>document reordering strategy<\/strong>&nbsp;to minimize position bias in LLMs.<\/li>\n\n\n\n<li>We establish&nbsp;<strong>dynamic compression ratios<\/strong>&nbsp;for precise control between coarse and fine compression levels<\/li>\n\n\n\n<li>We propose a&nbsp;<strong>post-compression subsequence recovery strategy<\/strong>&nbsp;to improve the integrity of the key information<\/li>\n\n\n\n<li>We evaluate LongLLMLingua across&nbsp;<strong>five benchmarks<\/strong>, i.e., NaturalQuestions, LongBench, ZeroSCROLLS , MuSicQue, and LooGLE, covering a variety of long context scenarios. Experimental results reveal that LongLLMLingua\u2019s compressed prompts outperform original prompts in terms of performance, cost efficiency, and system latency.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"560\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_framework-1024x560.png\" alt=\"diagram\" class=\"wp-image-1016583\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_framework-1024x560.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_framework-300x164.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_framework-768x420.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_framework-1536x840.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_framework-2048x1120.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_framework-240x131.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"empirical-studies-of-question-aware-compression\"><strong>Empirical Studies of Question-aware Compression<\/strong><\/h3>\n\n\n\n<p>To test the effectiveness of our proposed question-aware coarse-grained and fine-grained compression method, we conducted an empirical study across two dimensions.<br>Firstly, we analyzed the effectiveness of the question-aware coarse-grained approach by comparing it with several state-of-the-art (SoTA) retrieval methods in real Retrieval-Augmented Generation (RAG) scenarios. We discovered that our method not only surpasses traditional retrieval methods such as BM25 and Gzip but also outperforms embedding methods like OpenAI embedding, Jina, and BGE, as well as various reranker methods, including Cohere reranker and BGE-Reranker.<br>Secondly, we assessed the effectiveness of the question-aware fine-grained approach by comparing perplexity and contrastive perplexity across various document context scenarios. It was observed that contrastive perplexity effectively captures key information in documents, while perplexity struggles to identify relevant information.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"500\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_empirical-1024x500.png\" alt=\"chart\" class=\"wp-image-1016586\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_empirical-1024x500.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_empirical-300x147.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_empirical-768x375.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_empirical-1536x750.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_empirical-2048x1000.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LongLLMLingua_empirical-240x117.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"bibtex\"><strong>BibTeX<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>@inproceedings{jiang-etal-2024-longllmlingua,\n    title = \"{L}ong{LLML}ingua: Accelerating and Enhancing {LLM}s in Long Context Scenarios via Prompt Compression\",\n    author = \"Huiqiang Jiang and Qianhui Wu and and Xufang Luo and Dongsheng Li and Chin-Yew Lin and Yuqing Yang and Lili Qiu\",\n    editor = \"Ku, Lun-Wei  and\n      Martins, Andre  and\n      Srikumar, Vivek\",\n    booktitle = \"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    month = aug,\n    year = \"2024\",\n    address = \"Bangkok, Thailand\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https:\/\/aclanthology.org\/2024.acl-long.91\",\n    pages = \"1658--1677\",\n}<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n\n\n<p>Paper: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2403.12968\">https:\/\/arxiv.org\/abs\/2403.12968<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>Project Page: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/llmlingua.com\/llmlingua2.html\">https:\/\/llmlingua.com\/llmlingua2.html<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>Demo: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/huggingface.co\/spaces\/microsoft\/llmlingua-2\">https:\/\/huggingface.co\/spaces\/microsoft\/llmlingua-2<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"why-llmlingua-2\"><strong>Why&nbsp;<em>LLMLingua-2?<\/em><\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"578\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-1024x578.png\" alt=\"LLMLingua-2 Onepage\" class=\"wp-image-1016529\" style=\"width:800px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-1024x578.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-768x434.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-1536x867.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-2048x1157.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-240x136.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/LLMLingua-2-640x360.png 640w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Challenges Encountered in Information Entropy Based Methods\u200b:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\ud83e\udd14<\/strong>&nbsp;&nbsp;Perplexity or information entropy may be suboptimal for prompt trimming:&nbsp;<strong>Not aligned with the prompt compression objective.<\/strong> <\/li>\n\n\n\n<li>\ud83e\udd16 How can we identify or build a suitable dataset to <strong>align the SLM<\/strong> <strong>towards effective prompt compression<\/strong>?<\/li>\n\n\n\n<li><strong>\u27a1\ufe0f<\/strong>&nbsp;&nbsp;Importance of tokens is context-dependent. Causal LMs&nbsp;<strong>only leverage unidirectional context<\/strong>, which may fail to capture all essential information within the context.<\/li>\n\n\n\n<li>\ud83d\udd04 How can we design a compression algorithm that effectively leverage the full bidirectional context?<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"why-data-distillation\"><strong>Why Data Distillation?<\/strong><\/h3>\n\n\n\n<p>Shortcomings of Existing Text Compression Datasets \u200b:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\ud83d\ude22<\/strong>&nbsp;&nbsp;Most text compression datasets are&nbsp;<strong>abstractive<\/strong>, which leads to&nbsp;<strong>slow autoregressive process<\/strong>&nbsp;and may&nbsp;<strong>produce hallucinated content<\/strong>.<\/li>\n\n\n\n<li><strong>\ud83e\udd37\u200d\u2642\ufe0f<\/strong>&nbsp;&nbsp;Extractive compression datasets such as&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aclanthology.org\/D13-1155\/\"><strong>SentComp<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aclanthology.org\/2020.argmining-1.1\/\"><strong>DebateSum<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;are mainly created for the summarization task and often lack detailed information. In the case of prompt compression, we should&nbsp;<strong>retain essential information<\/strong>&nbsp;as much as possible.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"bibtex-2\"><strong>BibTeX<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>@inproceedings{pan-etal-2024-llmlingua,\n    title = \"{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression\",\n    author = \"Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang\",\n    editor = \"Ku, Lun-Wei  and\n      Martins, Andre  and\n      Srikumar, Vivek\",\n    booktitle = \"Findings of the Association for Computational Linguistics ACL 2024\",\n    month = aug,\n    year = \"2024\",\n    address = \"Bangkok, Thailand and virtual meeting\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https:\/\/aclanthology.org\/2024.findings-acl.57\",\n    pages = \"963--981\",\n}<\/code><\/pre>\n\n\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Effectively Deliver Information to LLMs via&nbsp;Prompt Compression LLMLingua Read More (opens in new tab) LongLLMLingua Read More (opens in new tab) LLMLingua-2 Read More (opens in new tab) Large language models (LLMs) have demonstrated remarkable capabilities and have been applied across various fields. Advancements in technologies such as Chain-of-Thought (CoT), In-Context Learning (ICL), and Retrieval-Augmented [&hellip;]<\/p>\n","protected":false},"featured_media":978339,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13545],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-978333","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2023-02-01","related-publications":[974562,978312,1016619],"related-downloads":[],"related-videos":[],"related-groups":[881388],"related-events":[],"related-opportunities":[],"related-posts":[987321,1025451],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Qianhui Wu","user_id":40741,"people_section":"Related people","alias":"qianhuiwu"},{"type":"user_nicename","display_name":"Yuqing Yang","user_id":40654,"people_section":"Related people","alias":"yuqyang"},{"type":"user_nicename","display_name":"Chin-Yew Lin","user_id":31493,"people_section":"Related people","alias":"cyl"},{"type":"user_nicename","display_name":"Dongsheng Li","user_id":39402,"people_section":"Related people","alias":"dongsli"},{"type":"user_nicename","display_name":"Molly Xia","user_id":41943,"people_section":"Related people","alias":"mollyxia"},{"type":"user_nicename","display_name":"Jue Zhang","user_id":41212,"people_section":"Related people","alias":"juezhang"},{"type":"user_nicename","display_name":"Qingwei Lin \u6797\u5e86\u7ef4","user_id":33318,"people_section":"Related people","alias":"qlin"},{"type":"user_nicename","display_name":"Victor Ruehle","user_id":41027,"people_section":"Related people","alias":"virueh"},{"type":"user_nicename","display_name":"Dongmei Zhang","user_id":31665,"people_section":"Related people","alias":"dongmeiz"}],"msr_research_lab":[199560],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/978333","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":54,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/978333\/revisions"}],"predecessor-version":[{"id":1133151,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/978333\/revisions\/1133151"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/978339"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=978333"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=978333"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=978333"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=978333"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=978333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}