{"id":1000743,"date":"2024-01-25T06:00:00","date_gmt":"2024-01-25T14:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1000743"},"modified":"2024-06-10T10:07:51","modified_gmt":"2024-06-10T17:07:51","slug":"abstracts-january-25-2024","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/abstracts-january-25-2024\/","title":{"rendered":"Abstracts: January 25, 2024"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788.png\" alt=\"MSR Podcast - Abstracts hero with a microphone icon\" class=\"wp-image-995001\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-1280x720.png 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n<div class=\"wp-block-msr-podcast-container my-4\">\n\t<iframe loading=\"lazy\" src=\"https:\/\/player.blubrry.com\/?podcast_id=128606588&modern=1\" class=\"podcast-player\" frameborder=\"0\" height=\"164px\" width=\"100%\" scrolling=\"no\" title=\"Podcast Player\"><\/iframe>\n<\/div>\n\n\n\n<p>Members of the research community at Microsoft work continuously to advance their respective fields. <em>Abstracts<\/em> brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.<\/p>\n\n\n\n<p>In this episode, Senior Researchers <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/joash\/\">Jordan Ash<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dimisra\/\">Dipendra Misra<\/a> join host Gretchen Huizinga to discuss \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-truth-is-in-there-improving-reasoning-in-language-models-with-layer-selective-rank-reduction\/\">The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction<\/a>,\u201d which was accepted to the 2024 International Conference on Learning Representations (ICLR). Layer-Selective Rank reduction, or <em>LASER<\/em>, is an intervention for targeted parameter reduction in transformer-based models. The work shows that the removal of certain parameters not only maintains model performance like some existing parameter-reduction methods but can actually <em>improve<\/em> it\u2014no additional training necessary.<\/p>\n\n\n\n<p>To learn more about the paper and related topics, register for <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/researchforum\/\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft Research Forum<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a series of panel discussions and lightning talks around science and technology research in the era of general AI.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-truth-is-in-there-improving-reasoning-in-language-models-with-layer-selective-rank-reduction\/\">Read the paper<\/a><\/div>\n\n\n\n<div class=\"wp-block-button is-style-fill-github\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/github.com\/pratyushasharma\/laser\" target=\"_blank\" rel=\"noreferrer noopener\">Get the code<\/a><\/div>\n<\/div>\n\n\n\n<section class=\"wp-block-msr-subscribe-to-podcast subscribe-to-podcast\">\n\t<div class=\"subscribe-to-podcast__inner border-top border-bottom border-width-2\">\n\t\t<h2 class=\"h5 subscribe-to-podcast__heading\">\n\t\t\tSubscribe to the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\">Microsoft Research Podcast<\/a>:\t\t<\/h2>\n\t\t<ul class=\"subscribe-to-podcast__list list-unstyled\">\n\t\t\t\t\t\t\t<li class=\"subscribe-to-podcast__list-item\">\n\t\t\t\t\t<a class=\"subscribe-to-podcast__link\" href=\"https:\/\/itunes.apple.com\/us\/podcast\/microsoft-research-a-podcast\/id1318021537?mt=2\" target=\"_blank\" rel=\"noreferrer noopener\">\n\t\t\t\t\t\t<svg class=\"subscribe-to-podcast__svg\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" fill=\"black\" viewBox=\"0 0 32 32\">  <path d=\"M7.12 0c-3.937-0.011-7.131 3.183-7.12 7.12v17.76c-0.011 3.937 3.183 7.131 7.12 7.12h17.76c3.937 0.011 7.131-3.183 7.12-7.12v-17.76c0.011-3.937-3.183-7.131-7.12-7.12zM15.817 3.421c3.115 0 5.932 1.204 8.079 3.453 1.631 1.693 2.547 3.489 3.016 5.855 0.161 0.787 0.161 2.932 0.009 3.817-0.5 2.817-2.041 5.339-4.317 7.063-0.812 0.615-2.797 1.683-3.115 1.683-0.12 0-0.129-0.12-0.077-0.615 0.099-0.792 0.192-0.953 0.64-1.141 0.713-0.296 1.932-1.167 2.677-1.911 1.301-1.303 2.229-2.932 2.677-4.719 0.281-1.1 0.244-3.543-0.063-4.672-0.969-3.595-3.907-6.385-7.5-7.136-1.041-0.213-2.943-0.213-4 0-3.636 0.751-6.647 3.683-7.563 7.371-0.245 1.004-0.245 3.448 0 4.448 0.609 2.443 2.188 4.681 4.255 6.015 0.407 0.271 0.896 0.547 1.1 0.631 0.447 0.192 0.547 0.355 0.629 1.14 0.052 0.485 0.041 0.62-0.072 0.62-0.073 0-0.62-0.235-1.199-0.511l-0.052-0.041c-3.297-1.62-5.407-4.364-6.177-8.016-0.187-0.943-0.224-3.187-0.036-4.052 0.479-2.323 1.396-4.135 2.921-5.739 2.199-2.319 5.027-3.543 8.172-3.543zM16 7.172c0.541 0.005 1.068 0.052 1.473 0.14 3.715 0.828 6.344 4.543 5.833 8.229-0.203 1.489-0.713 2.709-1.619 3.844-0.448 0.573-1.537 1.532-1.729 1.532-0.032 0-0.063-0.365-0.063-0.803v-0.808l0.552-0.661c2.093-2.505 1.943-6.005-0.339-8.296-0.885-0.896-1.912-1.423-3.235-1.661-0.853-0.161-1.031-0.161-1.927-0.011-1.364 0.219-2.417 0.744-3.355 1.672-2.291 2.271-2.443 5.791-0.348 8.296l0.552 0.661v0.813c0 0.448-0.037 0.807-0.084 0.807-0.036 0-0.349-0.213-0.683-0.479l-0.047-0.016c-1.109-0.885-2.088-2.453-2.495-3.995-0.244-0.932-0.244-2.697 0.011-3.625 0.672-2.505 2.521-4.448 5.079-5.359 0.547-0.193 1.509-0.297 2.416-0.281zM15.823 11.156c0.417 0 0.828 0.084 1.131 0.24 0.645 0.339 1.183 0.989 1.385 1.677 0.62 2.104-1.609 3.948-3.631 3.005h-0.015c-0.953-0.443-1.464-1.276-1.475-2.36 0-0.979 0.541-1.828 1.484-2.328 0.297-0.156 0.709-0.235 1.125-0.235zM15.812 17.464c1.319-0.005 2.271 0.463 2.625 1.291 0.265 0.62 0.167 2.573-0.292 5.735-0.307 2.208-0.479 2.765-0.905 3.141-0.589 0.52-1.417 0.667-2.209 0.385h-0.004c-0.953-0.344-1.157-0.808-1.553-3.527-0.452-3.161-0.552-5.115-0.285-5.735 0.348-0.823 1.296-1.285 2.624-1.291z\"\/><\/svg>\n\t\t\t\t\t\t<span class=\"subscribe-to-podcast__link-text\">Apple Podcasts<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/li>\n\t\t\t\n\t\t\t\t\t\t\t<li class=\"subscribe-to-podcast__list-item\">\n\t\t\t\t\t<a class=\"subscribe-to-podcast__link\" href=\"https:\/\/subscribebyemail.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\" target=\"_blank\" rel=\"noreferrer noopener\">\n\t\t\t\t\t\t<svg class=\"subscribe-to-podcast__svg\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" fill=\"none\" viewBox=\"0 0 32 32\"><path fill=\"currentColor\" d=\"M6.4 6a2.392 2.392 0 00-2.372 2.119L16 15.6l11.972-7.481A2.392 2.392 0 0025.6 6H6.4zM4 10.502V22.8a2.4 2.4 0 002.4 2.4h19.2a2.4 2.4 0 002.4-2.4V10.502l-11.365 7.102a1.2 1.2 0 01-1.27 0L4 10.502z\"\/><\/svg>\n\t\t\t\t\t\t<span class=\"subscribe-to-podcast__link-text\">Email<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/li>\n\t\t\t\n\t\t\t\t\t\t\t<li class=\"subscribe-to-podcast__list-item\">\n\t\t\t\t\t<a class=\"subscribe-to-podcast__link\" href=\"https:\/\/subscribeonandroid.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\" target=\"_blank\" rel=\"noreferrer noopener\">\n\t\t\t\t\t\t<svg class=\"subscribe-to-podcast__svg\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" fill=\"none\" viewBox=\"0 0 32 32\"><path fill=\"currentColor\" d=\"M12.414 4.02c-.062.012-.126.023-.18.06a.489.489 0 00-.12.675L13.149 6.3c-1.6.847-2.792 2.255-3.18 3.944h13.257c-.388-1.69-1.58-3.097-3.179-3.944l1.035-1.545a.489.489 0 00-.12-.675.492.492 0 00-.675.135l-1.14 1.68a7.423 7.423 0 00-2.55-.45c-.899 0-1.758.161-2.549.45l-1.14-1.68a.482.482 0 00-.494-.195zm1.545 3.824a.72.72 0 110 1.44.72.72 0 010-1.44zm5.278 0a.719.719 0 110 1.44.719.719 0 110-1.44zM8.44 11.204A1.44 1.44 0 007 12.644v6.718c0 .795.645 1.44 1.44 1.44.168 0 .33-.036.48-.09v-9.418a1.406 1.406 0 00-.48-.09zm1.44 0V21.76c0 .793.646 1.44 1.44 1.44h10.557c.793 0 1.44-.647 1.44-1.44V11.204H9.878zm14.876 0c-.169 0-.33.035-.48.09v9.418c.15.052.311.09.48.09a1.44 1.44 0 001.44-1.44v-6.719a1.44 1.44 0 00-1.44-1.44zM11.8 24.16v1.92a1.92 1.92 0 003.84 0v-1.92h-3.84zm5.759 0v1.92a1.92 1.92 0 003.84 0v-1.92h-3.84z\"\/><\/svg>\n\t\t\t\t\t\t<span class=\"subscribe-to-podcast__link-text\">Android<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/li>\n\t\t\t\n\t\t\t\t\t\t\t<li class=\"subscribe-to-podcast__list-item\">\n\t\t\t\t\t<a class=\"subscribe-to-podcast__link\" href=\"https:\/\/open.spotify.com\/show\/4ndjUXyL0hH1FXHgwIiTWU\" target=\"_blank\" rel=\"noreferrer noopener\">\n\t\t\t\t\t\t<svg class=\"subscribe-to-podcast__svg\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" fill=\"none\" viewBox=\"0 0 32 32\"><path fill=\"currentColor\" d=\"M16 4C9.383 4 4 9.383 4 16s5.383 12 12 12 12-5.383 12-12S22.617 4 16 4zm5.08 17.394a.781.781 0 01-1.086.217c-1.29-.86-3.477-1.434-5.303-1.434-1.937.002-3.389.477-3.403.482a.782.782 0 11-.494-1.484c.068-.023 1.71-.56 3.897-.562 1.826 0 4.365.492 6.171 1.696.36.24.457.725.217 1.085zm1.56-3.202a.895.895 0 01-1.234.286c-2.338-1.457-4.742-1.766-6.812-1.747-2.338.02-4.207.466-4.239.476a.895.895 0 11-.488-1.723c.145-.041 2.01-.5 4.564-.521 2.329-.02 5.23.318 7.923 1.995.419.26.547.814.286 1.234zm1.556-3.745a1.043 1.043 0 01-1.428.371c-2.725-1.6-6.039-1.94-8.339-1.942h-.033c-2.781 0-4.923.489-4.944.494a1.044 1.044 0 01-.474-2.031c.096-.023 2.385-.55 5.418-.55h.036c2.558.004 6.264.393 9.393 2.23.497.292.663.931.371 1.428z\"\/><\/svg>\n\t\t\t\t\t\t<span class=\"subscribe-to-podcast__link-text\">Spotify<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/li>\n\t\t\t\n\t\t\t\t\t\t\t<li class=\"subscribe-to-podcast__list-item\">\n\t\t\t\t\t<a class=\"subscribe-to-podcast__link\" href=\"https:\/\/www.blubrry.com\/feeds\/microsoftresearch.xml\" target=\"_blank\" rel=\"noreferrer noopener\">\n\t\t\t\t\t\t<svg class=\"subscribe-to-podcast__svg\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" fill=\"none\" viewBox=\"0 0 32 32\"><path fill=\"currentColor\" d=\"M6.667 4a2.676 2.676 0 00-2.612 2.13v.003c-.036.172-.055.35-.055.534v18.666c0 .183.019.362.055.534v.003a2.676 2.676 0 002.076 2.075h.002c.172.036.35.055.534.055h18.666A2.676 2.676 0 0028 25.333V6.667a2.676 2.676 0 00-2.13-2.612h-.003A2.623 2.623 0 0025.333 4H6.667zM8 8h1.333C17.42 8 24 14.58 24 22.667V24h-2.667v-1.333c0-6.618-5.382-12-12-12H8V8zm0 5.333h1.333c5.146 0 9.334 4.188 9.334 9.334V24H16v-1.333A6.674 6.674 0 009.333 16H8v-2.667zM10 20a2 2 0 11-.001 4.001A2 2 0 0110 20z\"\/><\/svg>\n\t\t\t\t\t\t<span class=\"subscribe-to-podcast__link-text\">RSS Feed<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/li>\n\t\t\t\t\t<\/ul>\n\t<\/div>\n<\/section>\n\n\n<div class=\"wp-block-msr-show-more\">\n\t<div class=\"bg-neutral-100 p-5\">\n\t\t<div class=\"show-more-show-less\">\n\t\t\t<div>\n\t\t\t\t<span>\n\t\t\t\t\t\n\n<h2 class=\"wp-block-heading\" id=\"transcript\">Transcript<\/h2>\n\n\n\n<p>[MUSIC PLAYS]<\/p>\n\n\n\n<p><strong>GRETCHEN HUIZINGA: <\/strong>Welcome to <em>Abstracts<\/em>, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I\u2019m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot\u2014or a <em>podcast abstract<\/em>\u2014of their new and noteworthy papers.<\/p>\n\n\n\n<p>[MUSIC FADES]<\/p>\n\n\n\n<p>Today, I&#8217;m talking to Dr. Dipendra Misra and Dr. Jordan Ash, both senior researchers at Microsoft Research. Drs. Misra and Ash are coauthors of a paper called \u201cThe Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction,\u201d also known as <em>LASER<\/em>. This paper has been accepted at the International Conference on Learning Representations, or ICLR, in Vienna this year, and you can read a preprint of it now on arXiv. Dipendra, Jordan, thanks for joining us on <em>Abstracts<\/em>!<\/p>\n\n\n\n\t\t\t\t<\/span>\n\t\t\t\t<span id=\"show-more-show-less-toggle-1\" class=\"show-more-show-less-toggleable-content\">\n\t\t\t\t\t\n\n\n\n<p><strong>JORDAN ASH: <\/strong>Thanks for having us.<\/p>\n\n\n\n<p><strong>DIPENDRA MISRA:<\/strong> Yeah, thanks for having us, Gretchen.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> Dipendra, let&#8217;s start with a general overview of this paper. In a few sentences, describe the issue or problem your work addresses and, perhaps more importantly, why we should care about it.<\/p>\n\n\n\n<p><strong>MISRA:<\/strong> Thanks, Gretchen. So as we know, large language models, also known as LLMs, have revolutionized both business and research in artificial intelligence. They are everywhere, being used to solve a wide range of problems. So in our paper, we introduce an intervention which can be applied to any existing pretrained large language models, and our main purpose for introducing this is to see how it affects the performance of the LLMs and whether we can gain insight into how an LLM stores information in its parameters and how it uses that information to generate a response. And what our intervention does is that it performs a low-rank approximation of the parameters of the LLM. And the surprising discovery that our paper makes is that if you do this intervention correctly, then we can get significant improvement on various tasks for different LLMs.<\/p>\n\n\n\n<p><strong>HUIZINGA: <\/strong>So that\u2019s the first part of the question. Tell me why I should care about it!<\/p>\n\n\n\n<p><strong>MISRA: <\/strong>So if you are a person who uses LLMs for solving any tasks, then you do care about performance on a given task. So, for example, you could be using LLMs to generate an email, right, from a given description. Or you could be using an LLM to do question answering. And by applying our intervention, we can gain accuracy on the task that we care about.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> Well, let\u2019s stick with you, Dipendra, for a minute and talk about the field writ large. Almost all research owes a debt to some other research that went before. So tell us a bit about the related work in this field and how your work builds on or adds to it.<\/p>\n\n\n\n<p><strong>MISRA:<\/strong> So the work that is most closely related to our LASER paper is this growing body of work on understanding how knowledge is stored and edited inside a large language model. So these works don&#8217;t apply the intervention that we do, but they were certainly inspirational for us for arriving at the intervention that we introduced. Another line of work which is very related is, like, adding a small number of parameters to improve the performance of the LLM on a given task. The most relevant work in this space is the LoRA paper, also known as the \u201cLow-Rank Adaptation of Large Language Models,\u201d which came from Microsoft. And what LoRA does, it adds a small number of additional parameters to an LLM and then fine-tunes it on a given task. And what our intervention, called LASER, does is that it removes parameters instead of adding it. And another line of work which is also related is the work on model compression. So there are people who focus on breaking down the size of the models as much as possible while still retaining the performance, more or less, compared to the base model. And so these people are also focused on removing parameters, but they are coming at a different angle of, like, trying to reduce the memory footprint, while what we were doing is that we are less focused on the memory footprint\u2014that\u2019s more like a side effect of it\u2014and more like if I were to fiddle with this parameter of the LLM, then how does it affect the performance? And what can we learn by looking at the comparison? Like, OK, so if I remove this parameter, I see the performance drop; then it means that these parameters are storing something about this type of task on which the performance is dropping.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> So I\u2019ll ask you one more question, Dipendra, before I pull Jordan into the conversation, and that would be about your methodology. How would you describe your approach to this project, and how did you conduct the research?<\/p>\n\n\n\n<p><strong>MISRA:<\/strong> So we started by analyzing the intervention LASER on a particular LLM called GPT-J and evaluating its performance on this question-answering data CounterFact. So our idea was, like, before trying this thing on [a] bunch of things, let\u2019s just understand this in one setting deeply and, kind of, build insights that we can then evaluate in other settings. And the reason we chose this setup was that the GPT-J large language model has its training data publicly available. It&#8217;s called the Pile dataset. And that allows us to do analysis with the training data. For example, is the performance dropping on data points which are rarer or more frequent in the training data? And this is important because training data analysis is frequently omitted in existing LLM literature, and that\u2019s something we wanted to do. And the second reason is that the CounterFact question-answering data is both related to the prior work in this space, so there was a reason for choosing it, but also it has paraphrases of the same question. For example, it might ask, like, \u201cWho is the president of United States of America?\u201d But it will also have paraphrases like \u201cThe president of the United States of America is \u2026\u201d or \u201cThe head of the government of United States of America is \u2026\u201d And so it will have different variations of the same question. And then you can see if the LLM is able to get all of them right, or is it not robust to variations of the same question? And so we did analysis on this GPT-J and CounterFact dataset. And Jordan will talk more about what the results were. And so based on this rigorous analysis, we developed some insights as to what the intervention is doing. And then we evaluated these insights on other settings. So then we tried, like, two other different large language models and evaluated it on, like, multiple different datasets. And then we saw that the insights actually hold more broadly. And finally, we also evaluated this in a non-text related task, right. Because the intervention could, in principle, be applied to any neural network. So we went after this reinforcement learning model, which solves a puzzle called Sokoban. And we also saw that if you apply this intervention correctly, then you can get some performance improvement. So it\u2019s not related to just large language models, although that was our main motivation.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> Well, Jordan, let\u2019s get your take on the last few questions here. As I\u2019ve said before, the most interesting section of a research paper for me is the part where it says, \u201cand what we found was \u2026\u201d So as a result of this research, what <em>did<\/em> you find? Were there outcomes that you expected, or were there any surprises?<\/p>\n\n\n\n<p><strong>ASH:<\/strong> I would say this paper is full of surprises. So as Dipendra was mentioning earlier, the LASER intervention <em>removes<\/em> information from a model. It doesn\u2019t add information to a model. And up until now, there\u2019s been a lot of work on pruning model parameters for a variety of reasons. But generally, these papers show that as parameters are removed from the model, performance just does not degrade. You can, overall, keep performance roughly the same even with a fairly drastic reduction of model parameters. And those reductions are typically done across layers of the model. What we\u2019re showing here is surprising because we\u2019re showing if we do a very targeted intervention, maybe at only one layer of the model, we could actually get a big boost in performance rather than just, you know, keep it the same or something like this.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> Hmm. So with those results in mind, Jordan, I\u2019m curious about practical applications. How would you say this research makes an impact in real-world situations? I know that Dipendra alluded to that earlier, but where is this most useful and who benefits most?<\/p>\n\n\n\n<p><strong>ASH:<\/strong> I think the short sales pitch for this technique is that you could potentially improve the performance of a language model with no additional training at all just by applying this intervention, which again just removes information from the model, so you don\u2019t need to have any extra data on hand to refine the model or to add new information into it. The real-world situations we\u2019re seeing a boost right now in LASER is for, like, question answering or reasoning-type tasks where there is, there\u2019s, like, a concrete answer that corresponds to what you\u2019re asking the LLM rather than just a, sort of, like, broad-purpose generative task.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> So typically speaking, when you\u2019re dealing with LLMs, part of the issue is prompt engineering. And it\u2019s like <em>my<\/em> responsibility to be able to put the right words in it so I\u2019ll get the best answer from the model, right? Are you saying that this helps me not have to be that good on the prompt-engineer end versus what the model can interpret and do?<\/p>\n\n\n\n<p><strong>ASH:<\/strong> I think prompt engineering still has a place in, sort of, eking out a good answer from a language model, but given a fixed prompt, this intervention seems to offer an improved accuracy over not intervening at all and applying the same prompt.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> So, Jordan, I often think of an abstract as a sort of appetizer for a research paper. But let\u2019s distill it even further. If there was one thing\u2014sort of an <em>amuse-bouche<\/em>, if you will\u2014that you want our listeners to take away from this work, what would it be?<\/p>\n\n\n\n<p><strong>ASH:<\/strong> For me, I like this idea of how, you know, typically if you want to get a model to perform better, you would take that model off the shelf and you would refine it on data related to the task at hand. And that might take the form of refining all of the parameters or doing some low-rank LoRA-type thing that Dipendra alluded to earlier. Here, we counterintuitively show that sometimes just carefully removing information from the model can have a positive effect, as well. And this is great news because refining a model requires a lot of new target domain data to be available, but removing information from the model doesn\u2019t necessarily have that same constraint.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> Well, finally, let\u2019s talk a little bit about the future, Jordan, and I\u2019ll have you close the show for us. What unanswered questions or ongoing research challenges do you see here, and what\u2019s next maybe on your research agenda?<\/p>\n\n\n\n<p><strong>ASH:<\/strong> Yeah, I think there\u2019s a lot of exciting future work for this project. I think for one, as a practical matter, there\u2019s this question of just what\u2019s the best way to find the best LASER intervention? LASER targets a specific layer of the model, and then it finds the extent by which it should be rank-reduced. That search procedure is, kind of, expensive. Right now, we\u2019re doing it in a, sort of, exhaustive way. But also, it seems to be beneficial to apply LASER at multiple layers of the model. And that makes the search procedure, sort of, combinatorially explode. So finding out the best way to compose these interventions, I think, is an important area of future research. And then just, sort of, less on the practical side, I think there are all these questions related to just, why does this work at all? Like, why is it helpful to <em>remove<\/em> information from the model? And, you know, I think there are some rough ideas we have about this. For example, when you\u2019re training a model on lots and lots of data, you know, it\u2019s not all created equally. Some of it might be noisy or low quality, and some of it might be high quality. And maybe it\u2019s better to remove those samples at training time to get a better model. So I guess there\u2019s this question of, is pruning the model using a LASER-type intervention roughly equivalent to pruning the training data in a way to make it more favorable for eliciting a high-quality model? And again, like Dipendra alluded to earlier, this LoRA procedure, which does something that very much complements LASER and is often used to add information to a model, is it possible that LoRA is actually not just adding information but also removing information from the model? And perhaps that\u2019s one reason why LASER seems to be so effective.<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> So lots of questions.<\/p>\n\n\n\n<p><strong>ASH:<\/strong> I would say so, yeah!<\/p>\n\n\n\n<p><strong>HUIZINGA:<\/strong> Well, Dipendra Misra, Jordan Ash, thanks for joining us today. And to our listeners, thanks for tuning in.<\/p>\n\n\n\n<p>[MUSIC PLAYS]<\/p>\n\n\n\n<p>Again, you can find a link to this paper at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/aka.ms\/abstracts\">aka.ms\/abstracts<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> or on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2312.13558\" target=\"_blank\" rel=\"noopener noreferrer\">arXiv<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. And I\u2019ll also add that Dipendra will be speaking about this work at the upcoming Microsoft Research Forum, and you can register for this series of events at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/researchforum.microsoft.com\" target=\"_blank\" rel=\"noopener noreferrer\">researchforum.microsoft.com<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. See you next time on <em>Abstracts<\/em>!<\/p>\n\n\n\n<p>[MUSIC FADES]<\/p>\n\n\t\t\t\t<\/span>\n\t\t\t<\/div>\n\t\t\t<button\n\t\t\t\tclass=\"action-trigger glyph-prepend mt-2 mb-0 show-more-show-less-toggle\"\n\t\t\t\taria-expanded=\"false\"\n\t\t\t\tdata-show-less-text=\"Show less\"\n\t\t\t\ttype=\"button\"\n\t\t\t\taria-controls=\"show-more-show-less-toggle-1\"\n\t\t\t\taria-label=\"Show more content\"\n\t\t\t\tdata-alternate-aria-label=\"Show less content\">\n\t\t\t\tShow more\t\t\t<\/button>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>On \u201cAbstracts,\u201d Jordan Ash & Dipendra Misra discuss the parameter reduction method LASER. Tune in to learn how selective removal of stored data alone can boost LLM performance, then sign up for Microsoft Research Forum for more on LASER & related topics.<\/p>\n","protected":false},"author":42183,"featured_media":995001,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"https:\/\/player.blubrry.com\/id\/128606588","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":0,"footnotes":""},"categories":[240054],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243990],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[268128],"class_list":["post-1000743","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-msr-podcast","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-podcast-featured","msr-podcast-series-abstracts"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"https:\/\/player.blubrry.com\/id\/128606588","podcast_episode":"","msr_research_lab":[199571,992148],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144902,395930],"related-projects":[],"related-events":[],"related-researchers":[{"type":"guest","value":"gretchen-huizinga-2","user_id":"444834","display_name":"Gretchen Huizinga","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/gretchen-huizinga-phd-3a4b2921?trk=people-guest_people_search-card\" aria-label=\"Visit the profile page for Gretchen Huizinga\">Gretchen Huizinga<\/a>","is_active":true,"last_first":"Huizinga, Gretchen","people_section":0,"alias":"gretchen-huizinga-2"},{"type":"user_nicename","value":"Jordan Ash","user_id":39826,"display_name":"Jordan Ash","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/joash\/\" aria-label=\"Visit the profile page for Jordan Ash\">Jordan Ash<\/a>","is_active":false,"last_first":"Ash, Jordan","people_section":0,"alias":"joash"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"Microsoft Research Podcast - Abstracts hero with a microphone icon\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/Podcast_Abstracts_Hero_Feature_No_Text_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.linkedin.com\/in\/gretchen-huizinga-phd-3a4b2921?trk=people-guest_people_search-card\" title=\"Go to researcher profile for Gretchen Huizinga\" aria-label=\"Go to researcher profile for Gretchen Huizinga\" data-bi-type=\"byline author\" data-bi-cN=\"Gretchen Huizinga\">Gretchen Huizinga<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/joash\/\" title=\"Go to researcher profile for Jordan Ash\" aria-label=\"Go to researcher profile for Jordan Ash\" data-bi-type=\"byline author\" data-bi-cN=\"Jordan Ash\">Jordan Ash<\/a>, and Dipendra Misra","formattedDate":"January 25, 2024","formattedExcerpt":"On \u201cAbstracts,\u201d Jordan Ash &amp; Dipendra Misra discuss the parameter reduction method LASER. Tune in to learn how selective removal of stored data alone can boost LLM performance, then sign up for Microsoft Research Forum for more on LASER &amp; related topics.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1000743","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1000743"}],"version-history":[{"count":24,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1000743\/revisions"}],"predecessor-version":[{"id":1001988,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1000743\/revisions\/1001988"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/995001"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1000743"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1000743"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1000743"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1000743"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1000743"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1000743"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1000743"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1000743"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1000743"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1000743"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1000743"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}