{"id":786433,"date":"2021-10-19T14:35:36","date_gmt":"2021-10-19T21:35:36","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=786433"},"modified":"2024-01-22T11:37:41","modified_gmt":"2024-01-22T19:37:41","slug":"what-do-compressed-large-language-models-forget-robustness-challenges-in-model-compression","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/what-do-compressed-large-language-models-forget-robustness-challenges-in-model-compression\/","title":{"rendered":"What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression"},"content":{"rendered":"<p><span id=\"2110.08419v1-abstract-full\" class=\"abstract-full has-text-grey-dark mathjax\">Recent works have focused on <span class=\"search-hit mathjax\">compressing<\/span>\u00a0pre-trained\u00a0<span class=\"search-hit mathjax\">language<\/span>\u00a0<span class=\"search-hit mathjax\">models<\/span> (PLMs) like BERT where the major focus has been to improve the <span class=\"search-hit mathjax\">compressed<\/span>\u00a0<span class=\"search-hit mathjax\">model<\/span>\u00a0performance for downstream tasks. However, there has been no study in analyzing the impact of\u00a0<span class=\"search-hit mathjax\">compression<\/span> on the generalizability and robustness of these <span class=\"search-hit mathjax\">models<\/span>. Towards this end, we study two popular\u00a0<span class=\"search-hit mathjax\">model<\/span>\u00a0<span class=\"search-hit mathjax\">compression<\/span> techniques including knowledge distillation and pruning and show that <span class=\"search-hit mathjax\">compressed<\/span>\u00a0<span class=\"search-hit mathjax\">models<\/span> are significantly less robust than their PLM counterparts on adversarial test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the <span class=\"search-hit mathjax\">compressed<\/span>\u00a0<span class=\"search-hit mathjax\">models<\/span> overfit on the easy samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for <span class=\"search-hit mathjax\">model<\/span>\u00a0<span class=\"search-hit mathjax\">compression<\/span>\u00a0based on sample uncertainty. Experimental results on several natural\u00a0<span class=\"search-hit mathjax\">language<\/span> understanding tasks demonstrate our mitigation framework to improve both the adversarial generalization as well as in-distribution task performance of the <span class=\"search-hit mathjax\">compressed<\/span>\u00a0<span class=\"search-hit mathjax\">models<\/span>.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent works have focused on compressing\u00a0pre-trained\u00a0language\u00a0models (PLMs) like BERT where the major focus has been to improve the compressed\u00a0model\u00a0performance for downstream tasks. However, there has been no study in analyzing the impact of\u00a0compression on the generalizability and robustness of these models. Towards this end, we study two popular\u00a0model\u00a0compression techniques including knowledge distillation and pruning and [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"EACL","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2023-4-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-786433","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2023-4-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2110.08419.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Mengnan Du","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Subhabrata (Subho) Mukherjee","user_id":38308,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Subhabrata (Subho) Mukherjee"},{"type":"user_nicename","value":"Yu Cheng","user_id":39663,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yu Cheng"},{"type":"user_nicename","value":"Milad Shokouhi","user_id":32921,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Milad Shokouhi"},{"type":"text","value":"Xia Hu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ahmed Awadallah","user_id":31979,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ahmed Awadallah"}],"msr_impact_theme":[],"msr_research_lab":[199565,992148],"msr_event":[],"msr_group":[392600,644373,702211],"msr_project":[804847,675957,675633],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":804847,"post_title":"Reducing AI's Carbon Footprint","post_name":"reducing-ais-carbon-footprint","post_type":"msr-project","post_date":"2022-05-24 08:56:55","post_modified":"2024-01-16 11:11:59","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/reducing-ais-carbon-footprint\/","post_excerpt":"This project develops techniques that enable AI to use computing infrastructure more efficiently. The goals are to maintain predictive accuracy while reducing carbon emissions, whether embodied in manufactured hardware, or produced from electricity usage when green energy is not available.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/804847"}]}},{"ID":675957,"post_title":"Trustworthy AI","post_name":"trustworthy-ai","post_type":"msr-project","post_date":"2020-07-18 11:01:48","post_modified":"2020-07-18 11:17:03","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/trustworthy-ai\/","post_excerpt":"In recent times, the explosion of information from a variety of sources and cutting edge techniques such as Deepfake have made it increasingly important to check the credibility and reliability of the data. Large volumes of data generated from diverse information channels like social media, online news outlets, and crowd-sourcing contribute valuable knowledge; however, this comes with additional challenges to ascertain the credibility of user-generated and machine-generated information. Given diverse information about an object (e.g.,&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/675957"}]}},{"ID":675633,"post_title":"Knowledge Distillation","post_name":"xtreme-knowledge-distillation","post_type":"msr-project","post_date":"2020-07-15 19:55:49","post_modified":"2021-06-23 18:09:19","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/xtreme-knowledge-distillation\/","post_excerpt":"Modern machine learning applications have enjoyed a great boost utilizing deep and large neural network models, allowing them to achieve state-of-the-art results on a wide range of tasks such as question-answering, conversational AI, search and recommendation. A significant challenge facing practitioners is how to deploy these huge models in practice. Recent pre-trained language models like Turing-NLG and GPT-3 boast of a massive 17 billion and 175 billion parameters, respectively. Although they obtain superior performance in&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/675633"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/786433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/786433\/revisions"}],"predecessor-version":[{"id":1001022,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/786433\/revisions\/1001022"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=786433"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=786433"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=786433"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=786433"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=786433"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=786433"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=786433"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=786433"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=786433"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=786433"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=786433"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=786433"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=786433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}