{"id":1071651,"date":"2024-08-07T18:09:00","date_gmt":"2024-08-08T01:09:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=1071651"},"modified":"2024-11-21T02:01:28","modified_gmt":"2024-11-21T10:01:28","slug":"new-arrival-in-research-14","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/new-arrival-in-research-14\/","title":{"rendered":"ACL\u4e0a\u65b0 | 6\u7bc7\u7cbe\u9009\u8bba\u6587\u5e26\u4f60\u770b\u6700\u65b0LLMs\u8fdb\u5c55"},"content":{"rendered":"\n<p>\u7f16\u8005\u6309\uff1a\u6b22\u8fce\u9605\u8bfb\u201c\u79d1\u7814\u4e0a\u65b0\u201d\u680f\u76ee\uff01\u201c\u79d1\u7814\u4e0a\u65b0\u201d\u6c47\u805a\u4e86\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u6700\u65b0\u7684\u521b\u65b0\u6210\u679c\u4e0e\u79d1\u7814\u52a8\u6001\u3002\u5728\u8fd9\u91cc\uff0c\u4f60\u53ef\u4ee5\u5feb\u901f\u6d4f\u89c8\u7814\u7a76\u9662\u7684\u4eae\u70b9\u8d44\u8baf\uff0c\u4fdd\u6301\u5bf9\u524d\u6cbf\u9886\u57df\u7684\u654f\u9510\u55c5\u89c9\uff0c\u540c\u65f6\u4e5f\u80fd\u627e\u5230\u5148\u8fdb\u5b9e\u7528\u7684\u5f00\u6e90\u5de5\u5177\u3002<\/p>\n\n\n\n<p>\u4e0b\u5468\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u56fd\u9645\u9876\u7ea7\u5b66\u672f\u4f1a\u8bae ACL 2024 \u5c06\u5728\u6cf0\u56fd\u66fc\u8c37\u4e3e\u529e\u3002\u672c\u5c4a\u5927\u4f1a\u4e0a\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u5171\u670914\u7bc7\u8bba\u6587\u5165\u9009\uff0c\u8fd9\u4e00\u671f\u7684\u201c\u79d1\u7814\u4e0a\u65b0\u201d\u680f\u76ee\u7cbe\u9009\u4e86\u5176\u4e2d\u7684\u516d\u7bc7\u4e3a\u5927\u5bb6\u8fdb\u884c\u7b80\u8981\u4ecb\u7ecd\u3002<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"01-\u57fa\u4e8e\u5fae\u8c03\u5927\u8bed\u8a00\u6a21\u578b\u7684\u751f\u6210\u5f0f\u63a8\u8350\u7cfb\u7edf\">01. \u57fa\u4e8e\u5fae\u8c03\u5927\u8bed\u8a00\u6a21\u578b\u7684\u751f\u6210\u5f0f\u63a8\u8350\u7cfb\u7edf<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"232\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-1-768x232-1.png\" alt=\"Aligning Large Language Models for Controllable Recommendations\" class=\"wp-image-1071660\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-1-768x232-1.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-1-768x232-1-300x91.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-1-768x232-1-240x73.png 240w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/figure>\n\n\n\n<p>\u8bba\u6587\u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2403.05063\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/arxiv.org\/pdf\/2403.05063<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>GitHub \u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/recai\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/github.com\/microsoft\/recai<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>\u5728\u6570\u5b57\u5316\u65f6\u4ee3\uff0c\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\u867d\u4fbf\u6377\uff0c\u5374\u5e38\u663e\u88ab\u52a8\uff0c\u96be\u4ee5\u6ee1\u8db3\u7528\u6237\u65e5\u76ca\u589e\u957f\u7684\u4e2a\u6027\u5316\u3001\u4ea4\u4e92\u6027\u9700\u6c42\u3002\u5176\u53ef\u89e3\u91ca\u6027\u7684\u7f3a\u5931\u4e0e\u53ef\u63a7\u6027\u7684\u4e0d\u8db3\uff0c\u4e5f\u6210\u4e3a\u4e86\u7528\u6237\u4f53\u9a8c\u5347\u7ea7\u7684\u74f6\u9888\u3002\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5176\u5353\u8d8a\u7684\u8bed\u8a00\u7406\u89e3\u3001\u77e5\u8bc6\u50a8\u5907\u3001\u63a8\u7406\u548c\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u6709\u671b\u6210\u4e3a\u4e0b\u4e00\u4ee3\u63a8\u8350\u7cfb\u7edf\u7684\u65b0\u5f15\u64ce\u3002<\/p>\n\n\n\n<p>\u53d7\u6b64\u542f\u53d1\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u548c\u6df1\u5733\u5927\u5b66\u7684\u7814\u7a76\u56e2\u961f\u5408\u4f5c\uff0c\u6253\u9020\u4e86\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u65b0\u4e00\u4ee3\u63a8\u8350\u7cfb\u7edf\u3002\u5b83\u7531\u5927\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\uff0c\u80fd\u66f4\u81ea\u7136\u5730\u7406\u89e3\u7528\u6237\u9700\u6c42\u7684\u52a8\u6001\u53d8\u5316\u5e76\u63d0\u4f9b\u66f4\u52a0\u7cbe\u51c6\u7684\u4e2a\u6027\u5316\u670d\u52a1\u3002\u5176\u4e2d\uff0c\u9488\u5bf9\u9886\u57df\u77e5\u8bc6\u548c\u9886\u57df\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u4e0e\u5bf9\u9f50\u6210\u4e3a\u5173\u952e\u3002\u7814\u7a76\u56e2\u961f\u4e3a\u6b64\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff1a\u76d1\u7763\u5b66\u4e60\uff08SL\uff09\u9636\u6bb5\u548c\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u9636\u6bb5\u3002<\/p>\n\n\n\n<p>\u5728\u76d1\u7763\u5b66\u4e60\u9636\u6bb5\uff0c\u7814\u7a76\u56e2\u961f\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u6027\u4efb\u52a1\uff0c\u5982\u7269\u54c1\u4fe1\u606f\u95ee\u7b54\u3001\u7269\u54c1\u63a8\u8350\u3001\u7c7b\u522b\u63a7\u5236\u7b49\uff0c\u7528\u6765\u589e\u5f3a\u5927\u8bed\u8a00\u6a21\u578b\u5bf9\u65b0\u77e5\u8bc6\u7684\u5bfc\u5165\uff0c\u4ee5\u53ca\u63d0\u9ad8\u63a8\u8350\u76f8\u5173\u7684\u590d\u6742\u6307\u4ee4\u7684\u9075\u4ece\u80fd\u529b\u3002\u540c\u65f6\uff0c\u4f20\u7edf\u63a8\u8350\u6a21\u578b\uff08\u4f8b\u5982 SASRec\uff09\u8fd8\u4f5c\u4e3a\u6559\u5e08\u6a21\u578b\uff0c\u5e2e\u52a9\u751f\u6210\u76d1\u7763\u5b66\u4e60\u6240\u7528\u7684\u6807\u7b7e\uff0c\u6709\u6548\u89e3\u51b3\u4e86\u8bad\u7ec3\u6570\u636e\u7a00\u758f\u7684\u95ee\u9898\u3002\u5728\u5f3a\u5316\u5b66\u4e60\u9636\u6bb5\uff0c\u7814\u7a76\u56e2\u961f\u5e0c\u671b\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6a21\u578b\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u53bb\u54cd\u5e94\u5f00\u653e\u6027\u7684\u7528\u6237\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5956\u52b1\u673a\u5236\uff0c\u8ba9\u6a21\u578b\u4e0d\u65ad\u671d\u7740\u66f4\u4f18\u7684\u65b9\u5411\u8fed\u4ee3\uff0c\u4f7f\u5176\u80fd\u591f\u66f4\u7cbe\u51c6\u5730\u670d\u4ece\u7528\u6237\u6307\u4ee4\u610f\u56fe\uff0c\u5e76\u51cf\u5c11\u8f93\u51fa\u7684\u683c\u5f0f\u9519\u8bef\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1430\" height=\"575\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-2.png\" alt=\"Aligning Large Language Models for Controllable Recommendations - diagram\" class=\"wp-image-1071663\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-2.png 1430w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-2-300x121.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-2-1024x412.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-2-768x309.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-2-240x97.png 240w\" sizes=\"auto, (max-width: 1430px) 100vw, 1430px\" \/><figcaption class=\"wp-element-caption\"><em>\u56fe1\uff1a\u65b9\u6cd5\u6982\u89c8<\/em><\/figcaption><\/figure>\n\n\n\n<p>\u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u751f\u6210\u5f0f\u63a8\u8350\u7cfb\u7edf\u80fd\u591f\u5f88\u597d\u5730\u54cd\u5e94\u591a\u79cd\u7528\u6237\u7684\u63a8\u8350\u8bf7\u6c42\uff0c\u4e3a\u4ea4\u4e92\u5f0f\u667a\u80fd\u63a8\u8350\u7cfb\u7edf\u6253\u4e0b\u57fa\u7840\u3002\u7814\u7a76\u56e2\u961f\u5728\u751f\u6210\u5f0f\u63a8\u8350\u65b9\u5411\u4e0a\u7684\u7814\u7a76\u5de5\u4f5c\u4e5f\u5c06\u6301\u7eed\u5728 GitHub \u4e2d\u5f00\u6e90\u5206\u4eab\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"02-\u5927\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5\">02. \u5927\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"179\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-3-768x179-1.png\" alt=\"Benchmarking Data Science Agents\" class=\"wp-image-1071666\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-3-768x179-1.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-3-768x179-1-300x70.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-3-768x179-1-240x56.png 240w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/figure>\n\n\n\n<p>\u8bba\u6587\u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2402.17168\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/arxiv.org\/pdf\/2402.17168<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>GitHub \u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/MetaCopilot\/dseval\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/github.com\/MetaCopilot\/dseval<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>\u6570\u636e\u79d1\u5b66\u901a\u8fc7\u5206\u6790\u5927\u91cf\u6570\u636e\u53ef\u4ee5\u5e2e\u52a9\u4e2a\u4eba\u548c\u7ec4\u7ec7\u505a\u51fa\u660e\u667a\u51b3\u7b56\u3001\u9884\u6d4b\u8d8b\u52bf\u548c\u6539\u8fdb\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u6570\u636e\u79d1\u5b66\u7684\u590d\u6742\u6027\u9700\u8981\u5e7f\u6cdb\u7684\u5206\u6790\u5de5\u5177\u548c\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u4e13\u5bb6\u4e5f\u6784\u6210\u4e86\u6311\u6218\u3002\u8fd1\u671f\uff0c\u5927\u8bed\u8a00\u6a21\u578b\u53ca\u5176\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u589e\u5f3a\u6570\u636e\u79d1\u5b66\u80fd\u529b\u65b9\u9762\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u7531\u4e8e LLMs \u7684\u9650\u5236\u3001\u4e0d\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u6216\u7f3a\u4e4f\u6545\u969c\u6062\u590d\u673a\u5236\uff0c\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u4ecd\u9762\u4e34\u53ef\u9760\u6027\u548c\u51c6\u786e\u6027\u7684\u95ee\u9898\uff08\u5982\u5ffd\u7565\u5217\u3001\u8bef\u89e3\u6570\u636e\u7c7b\u578b\u3001\u672a\u6309\u6307\u5b9a\u683c\u5f0f\u8f93\u51fa\u7ed3\u679c\u6216\u4fee\u6539\u539f\u59cb\u6570\u636e\uff09\u3002\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u5728\u8861\u91cf\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u80fd\u529b\u548c\u5c40\u9650\u6027\u65b9\u9762\u8fd8\u6709\u5f88\u5927\u7684\u8fdb\u6b65\u7a7a\u95f4\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1044\" height=\"766\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-4.png\" alt=\"Benchmarking Data Science Agents - diagram\" class=\"wp-image-1071669\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-4.png 1044w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-4-300x220.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-4-1024x751.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-4-768x563.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-4-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-4-240x176.png 240w\" sizes=\"auto, (max-width: 1044px) 100vw, 1044px\" \/><figcaption class=\"wp-element-caption\">\u56fe2\uff1a\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u5178\u578b\u5de5\u4f5c\u6d41\u7a0b\u56fe<\/figcaption><\/figure>\n\n\n\n<p>\u4e3a\u6b64\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u7684\u7814\u7a76\u5458\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u578b\u57fa\u51c6\u6846\u67b6 DSEval\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30 LLMs \u9a71\u52a8\u7684\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u3002\u5176\u901a\u8fc7\u5f15\u5165\u65b0\u7684\u6ce8\u91ca\u8fc7\u7a0b\u548c\u8bed\u8a00\uff08DSEAL\uff0cDSEval Annotation Language\uff09\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u51c6\u7684\u53ef\u6269\u5c55\u6027\u548c\u8986\u76d6\u8303\u56f4\u3002\u8be5\u6846\u67b6\u4e0d\u4ec5\u8986\u76d6\u4e86\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u6574\u4e2a\u751f\u547d\u5468\u671f\uff0c\u4ece\u63a5\u6536\u67e5\u8be2\u3001\u68c0\u7d22\u4e0a\u4e0b\u6587\u3001\u751f\u6210\u4ee3\u7801\u5230\u6267\u884c\u4ee3\u7801\u5e76\u8fd4\u56de\u7ed3\u679c\uff0c\u8fd8\u5305\u62ec\u4e00\u4e2a\u9a8c\u8bc1\u6a21\u5757\uff0c\u53ef\u4ee5\u6301\u7eed\u76d1\u63a7\u751f\u6210\u7684\u4ee3\u7801\u3001\u6267\u884c\u7684\u7ed3\u679c\u548c\u8fd0\u884c\u65f6\u4f1a\u8bdd\uff0c\u5e76\u4e0e\u53c2\u8003\u4ee3\u7801\u7247\u6bb5\u8fdb\u884c\u6bd4\u8f83\uff0c\u786e\u4fdd\u51c6\u786e\u6027\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1049\" height=\"554\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-5.png\" alt=\"Benchmarking Data Science Agents - diagram\" class=\"wp-image-1071672\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-5.png 1049w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-5-300x158.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-5-1024x541.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-5-768x406.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-5-240x127.png 240w\" sizes=\"auto, (max-width: 1049px) 100vw, 1049px\" \/><figcaption class=\"wp-element-caption\">\u56fe3\uff1a\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u751f\u547d\u5468\u671f\u548c DSEval \u6846\u67b6<\/figcaption><\/figure>\n\n\n\n<p>\u6b64\u5916\uff0cDSEAL \u8fd8\u88ab\u7528\u4e8e\u63cf\u8ff0\u548c\u914d\u7f6e\u95ee\u9898\u96c6\uff0c\u786e\u4fdd\u4e0e DSEval \u6846\u67b6\u517c\u5bb9\uff0c\u5e76\u6613\u4e8e\u7406\u89e3\u548c\u8c03\u8bd5\u3002\u95ee\u9898\u96c6\u7531\u7cfb\u7edf\u81ea\u52a8\u751f\u6210\u5e76\u7ecf\u4e13\u5bb6\u4fee\u8ba2\uff0c\u786e\u4fdd\u4e86\u5176\u591a\u6837\u6027\u548c\u51c6\u786e\u6027\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u7684\u5de5\u4f5c\u91cf\uff0c\u63d0\u9ad8\u4e86\u57fa\u51c6\u6d4b\u8bd5\u7684\u8d28\u91cf\u3002<\/p>\n\n\n\n<p>\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDSEval \u6846\u67b6\u5728\u8bc4\u4f30\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u65b9\u9762\u8868\u73b0\u4f18\u5f02\u3002\u901a\u8fc7\u5bf9\u4e0d\u540c\u4ee3\u7406\u65b9\u6cd5\u7684\u6bd4\u8f83\uff0c\u5b9e\u9a8c\u8fd8\u53d1\u73b0\u4e0a\u4e0b\u6587\u63d0\u53d6\u65b9\u6cd5\u5bf9 LLMs \u6027\u80fd\u6709\u663e\u8457\u5f71\u54cd\u3002\u5e76\u4e14\uff0c\u901a\u8fc7\u591a\u8f6e\u81ea\u6211\u4fee\u590d\u5c1d\u8bd5\uff0c\u4f4e\u80fd\u529b\u6a21\u578b\uff08\u5982GPT-3.5\uff09\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\u7684\u8868\u73b0\u4f18\u4e8e\u9ad8\u80fd\u529b\u6a21\u578b\uff08\u5982GPT-4\uff09\uff0c\u5c55\u793a\u4e86\u81ea\u6211\u4fee\u590d\u65b9\u6cd5\u7684\u5de8\u5927\u6f5c\u529b\u3002<\/p>\n\n\n\n<p>DSEval \u76ee\u524d\u5df2\u5f00\u6e90\u8be5\u6846\u67b6\u548c\u6570\u636e\u96c6\uff0c\u672a\u6765\u7814\u7a76\u5458\u4eec\u4e5f\u5c06\u7ee7\u7eed\u6df1\u5165\u63a2\u7d22\u8fd9\u4e00\u9886\u57df\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"03-bitdistiller-\u901a\u8fc7\u81ea\u84b8\u998f\u91ca\u653e\u4f4e\u4e8e4\u6bd4\u7279\u5927\u6a21\u578b\u7684\u6f5c\u529b\">03. BitDistiller\uff1a\u901a\u8fc7\u81ea\u84b8\u998f\u91ca\u653e\u4f4e\u4e8e4\u6bd4\u7279\u5927\u6a21\u578b\u7684\u6f5c\u529b<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"224\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-6-768x224-1.png\" alt=\"BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation\" class=\"wp-image-1071675\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-6-768x224-1.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-6-768x224-1-300x88.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-6-768x224-1-240x70.png 240w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/figure>\n\n\n\n<p>\u8bba\u6587\u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2402.10631\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/arxiv.org\/pdf\/2402.10631<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>GitHub \u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/DD-DuDa\/BitDistiller\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/github.com\/DD-DuDa\/BitDistiller<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>\u5927\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u968f\u89c4\u6a21\u6269\u5927\u8868\u73b0\u51fa\u8272\uff0c\u7136\u800c\uff0c\u6a21\u578b\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u9700\u6c42\u6fc0\u589e\u4f7f\u5b83\u4eec\u7684\u90e8\u7f72\u53d8\u5f97\u8d8a\u6765\u8d8a\u5177\u6311\u6218\u6027\u3002\u6743\u91cd\u91cf\u5316\u662f\u76ee\u524d\u89e3\u51b3\u8be5\u95ee\u9898\u7684\u5e38\u89c1\u6a21\u578b\u538b\u7f29\u65b9\u6cd5\uff0c\u80fd\u591f\u63d0\u5347\u63a8\u7406\u90e8\u7f72\u7684\u6548\u7387\u3002\u4f46\u4f4e\u4e8e4\u6bd4\u7279\u7684\u91cf\u5316\u65b9\u6cd5\u4f1a\u663e\u8457\u964d\u4f4e\u6a21\u578b\u6743\u91cd\u7684\u7cbe\u5ea6\uff0c\u8fdb\u800c\u5f71\u54cd\u6a21\u578b\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u8f83\u5c0f\u7684\u6a21\u578b\u6216\u9700\u8981\u590d\u6742\u63a8\u7406\u7684\u4efb\u52a1\u4e2d\u3002<\/p>\n\n\n\n<p>\u73b0\u6709\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u5982\u540e\u8bad\u7ec3\u91cf\u5316\uff08PTQ\uff09\uff0c\u7531\u4e8e\u6ca1\u6709\u7ecf\u8fc7\u91cd\u8bad\u7ec3\uff0c\u96be\u4ee5\u4fdd\u6301\u6a21\u578b\u7684\u51c6\u786e\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u91cf\u5316\u611f\u77e5\u8bad\u7ec3\uff08QAT\uff09\u901a\u8fc7\u4f18\u5316\u4f4e\u6bd4\u7279\u6743\u91cd\uff0c\u867d\u80fd\u4fdd\u6301\u6a21\u578b\u7684\u51c6\u786e\u6027\uff0c\u4f46\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff1a\u5982\u4f55\u5728\u4f4e\u6bd4\u7279\u91cf\u5316\u8fc7\u7a0b\u4e2d\u6700\u5927\u7a0b\u5ea6\u5730\u4fdd\u6301\u6743\u91cd\u7684\u7cbe\u5ea6\uff0c\u4ee5\u53ca\u5982\u4f55\u5728\u8bad\u7ec3\u4e2d\u9ad8\u6548\u5730\u5b66\u4e60\u4f4e\u6bd4\u7279\u8868\u793a\u3002<\/p>\n\n\n\n<p>\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u7684\u7814\u7a76\u5458\u4eec\u63d0\u51fa\u4e86\u57fa\u4e8e\u81ea\u6211\u84b8\u998f\u7684 QAT \u6846\u67b6\uff0c\u5373 BitDistiller\uff08\u89c1\u56fe4\u5de6\uff09\u3002BitDistiller \u91c7\u7528\u4e86\u5b9a\u5236\u7684\u975e\u5bf9\u79f0\u91cf\u5316\u548c Clipping \u6280\u672f\u6765\u63d0\u5347\u91cf\u5316\u6548\u679c\u3002\u975e\u5bf9\u79f0\u91cf\u5316\u5bf9\u6d6e\u70b9\u6570\u7684\u6b63\u8d1f\u6570\u91c7\u7528\u4e0d\u540c\u7684\u7f29\u653e\u65b9\u5f0f\uff0c\u5e76\u5bf9\u6574\u578b\u6570\u636e\u6dfb\u52a0\u96f6\u70b9\uff0c\u4ee5\u786e\u4fdd\u975e\u5bf9\u79f0\u6027\uff1bClipping \u6280\u672f\u5219\u901a\u8fc7\u81ea\u52a8\u622a\u53d6\u6b63\u8d1f\u6570\u7684\u79bb\u7fa4\u503c\u6765\u4f18\u5316\u6a21\u578b\u8868\u73b0\u3002\u6b64\u5916\uff0c\u7814\u7a76\u5458\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u7f6e\u4fe1\u5ea6\u611f\u77e5\u7684 Kullback-Leibler \u6563\u5ea6\uff08CAKLD\uff09\u76ee\u6807\uff0c\u901a\u8fc7\u81ea\u6211\u84b8\u998f\u66f4\u597d\u5730\u62df\u5408\u6559\u5e08\u6a21\u578b\u7684\u5206\u5e03\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u5feb\u7684\u6536\u655b\u548c\u66f4\u4f18\u7684\u6a21\u578b\u6027\u80fd\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1432\" height=\"637\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-7.png\" alt=\"BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation - chart\" class=\"wp-image-1071678\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-7.png 1432w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-7-300x133.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-7-1024x456.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-7-768x342.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-7-240x107.png 240w\" sizes=\"auto, (max-width: 1432px) 100vw, 1432px\" \/><figcaption class=\"wp-element-caption\">\u56fe4\uff1aBitDistiller \u7684\u6846\u67b6\u56fe\uff08\u5de6\uff09\u548c\u4ee3\u7801\u751f\u6210\u6a21\u578b\u7684\u91cf\u5316 Scaling Law \uff08\u53f3\uff09<\/figcaption><\/figure>\n\n\n\n<p>\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cBitDistiller \u57283\u6bd4\u7279\u548c2\u6bd4\u7279\u91cf\u5316\u914d\u7f6e\u4e0b\uff0c\u5728\u901a\u7528\u8bed\u8a00\u7406\u89e3\u548c\u590d\u6742\u63a8\u7406\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u8d85\u8d8a\u4e86\u73b0\u6709\u7684 PTQ \u548c QAT \u65b9\u6cd5\u3002\u7279\u522b\u662f\u5728\u590d\u6742\u7684\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\uff0cBitDistiller \u5c55\u73b0\u4e86\u663e\u8457\u4f18\u52bf\uff08\u89c1\u56fe4\u53f3\uff09\u3002\u8be5\u65b9\u6cd5\u4e0d\u4ec5\u5728\u8d44\u6e90\u53d7\u9650\u7684\u8bbe\u5907\u4e0a\u5b9e\u73b0\u4e86\u9ad8\u6548\u90e8\u7f72\uff0c\u800c\u4e14\u53ea\u9700\u8f83\u5c11\u7684\u8bad\u7ec3\u6570\u636e\u548c\u8d44\u6e90\uff0c\u663e\u793a\u51fa\u4e86\u5176\u5728\u6210\u672c\u6548\u76ca\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002<\/p>\n\n\n\n<p>\u4f4e\u6bd4\u7279\u91cf\u5316\u5df2\u7ecf\u6210\u4e3a\u9ad8\u6548\u90e8\u7f72\u5927\u8bed\u8a00\u6a21\u578b\u7684\u6807\u51c6\u65b9\u6cd5\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u652f\u6301\u4f4e\u6bd4\u7279\u5927\u8bed\u8a00\u6a21\u578b\u5728 GPU \u548c CPU \u4e0a\u7684\u90e8\u7f72\uff0c\u7814\u7a76\u56e2\u961f\u8fd8\u5f00\u53d1\u4e86 BitBLAS\uff08microsoft\/BitBLAS\uff09 \u548c T-MAC\uff08microsoft\/T-MAC\uff09 \u7cfb\u7edf\uff0c\u5e76\u5bf9\u7531 BitDistiller \u84b8\u998f\u76842\u6bd4\u7279\u6a21\u578b\u63d0\u4f9b\u4e86\u7aef\u5230\u7aef\u7684\u63a8\u7406\u652f\u6301\uff0c\u5c55\u793a\u4e86\u5176\u5728\u964d\u4f4e\u6210\u672c\u548c\u63d0\u5347\u6027\u80fd\u65b9\u9762\u7684\u663e\u8457\u4f18\u52bf\u548c\u5de8\u5927\u6f5c\u529b\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"04-pin-\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u4f18\u5316\u5f97\u5230\u53ef\u89e3\u91ca\u63d0\u793a\u8bcd\">04. PIN\uff1a\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u4f18\u5316\u5f97\u5230\u53ef\u89e3\u91ca\u63d0\u793a\u8bcd<\/h3>\n\n\n\n<p>\u8bba\u6587\u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.arxiv.org\/pdf\/2407.14733\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.arxiv.org\/pdf\/2407.14733<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5e94\u7528\u5230\u5177\u4f53\u4e0b\u6e38\u4efb\u52a1\uff08\u5982\u6587\u672c\u5206\u7c7b\uff09\u4e4b\u524d\u901a\u5e38\u9700\u8981\u8fdb\u884c\u5fae\u8c03\u3002\u63d0\u793a\u8bcd\u5fae\u8c03\uff08hard prompt tuning\uff09\u662f\u4e00\u79cd\u6709\u6548\u7684\u5fae\u8c03\u65b9\u6cd5\uff0c\u5176\u901a\u8fc7\u641c\u7d22\u5408\u9002\u7684\u63d0\u793a\u8bcd\uff0c\u53ef\u4ee5\u63d0\u5347\u6a21\u578b\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\uff0c\u5e76\u4e14\u5177\u6709\u6210\u672c\u4f4e\u3001\u9002\u7528\u8303\u56f4\u5e7f\uff0c\u65e0\u9700\u8c03\u6574\u6a21\u578b\u5185\u90e8\u53c2\u6570\u7684\u4f18\u70b9\u3002\u4f5c\u4e3a\u4e00\u4e2a\u79bb\u6563\u4f18\u5316\u95ee\u9898\uff0c\u76ee\u524d\u9488\u5bf9\u63d0\u793a\u8bcd\u5fae\u8c03\u91c7\u53d6\u7684\u4e3b\u6d41\u65b9\u6cd5\u662f\u5f3a\u5316\u5b66\u4e60\uff0c\u5176\u901a\u8fc7\u6bcf\u6b65\u9009\u62e9\u4e00\u4e2a\u63d0\u793a\u8bcd\u6765\u4f18\u5316\u4e00\u7cfb\u5217\u63d0\u793a\u8bcd\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u751f\u6210\u7684\u63d0\u793a\u8bcd\u901a\u5e38\u662f\u4e00\u4e9b\u4f4e\u9891\u4e14\u8bed\u4e49\u4e0d\u660e\u7684\u8bcd\u6c47\uff0c\u5f80\u5f80\u89e3\u91ca\u6027\u8f83\u5dee\u3002<\/p>\n\n\n\n<p>\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u95ee\u9898\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u7684\u7814\u7a76\u5458\u4eec\u63d0\u51fa\u4f7f\u7528 Tsallis \u71b5\u6765\u7ea6\u675f\u5f3a\u5316\u5b66\u4e60\u8fc7\u7a0b PIN\uff0c\u4ece\u800c\u5728\u91c7\u6837\u548c\u4ef7\u503c\u51fd\u6570\u4f30\u8ba1\u9636\u6bb5\u5173\u6ce8\u51fa\u73b0\u6982\u7387\u6700\u9ad8\u7684\u5019\u9009\u63d0\u793a\u8bcd\u3002\u8fd9\u4e0d\u4ec5\u52a0\u5feb\u4e86\u5bf9\u63d0\u793a\u8bcd\u4ef7\u503c\u7684\u8bc4\u4f30\uff0c\u8fd8\u907f\u514d\u4e86\u751f\u6210\u4f4e\u9891\u3001\u8bed\u4e49\u6a21\u7cca\u7684\u8bcd\u6c47\u3002PIN \u7b97\u6cd5\u662f\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u9886\u57df\u7684\u4e00\u9879\u91cd\u8981\u8fdb\u5c55\uff0c\u6709\u671b\u63d0\u5347\u6a21\u578b\u5728\u5404\u79cd\u5de5\u4e1a\u573a\u666f\u4e2d\u7684\u5e94\u7528\u6548\u679c\u3002<\/p>\n\n\n\n<p>\u5177\u4f53\u800c\u8a00\uff0c\u7814\u7a76\u4eec\u5458\u5728 RLPrompt \u7684\u57fa\u7840\u4e0a\u8fdb\u884c\u4e86\u4e24\u70b9\u6539\u8fdb\uff1a\u5728\u91c7\u6837\u9636\u6bb5\u907f\u514d\u9009\u62e9\u51fa\u73b0\u6982\u7387\u8f83\u4f4e\u7684\u63d0\u793a\u8bcd\uff08\u89c1\u7ea2\u6846\uff09\u4ee5\u53ca\u5728\u8ba1\u7b97\u76ee\u6807\u4ef7\u503c\u51fd\u6570\u65f6\u907f\u514d\u4f18\u5316\u4f4e\u6982\u7387\u5019\u9009\u63d0\u793a\u8bcd\u7684\u4ef7\u503c\u51fd\u6570\u3002\u8fd9\u4e9b\u6539\u8fdb\u901a\u8fc7 PIN \u7b97\u6cd5\u5f97\u4ee5\u5b9e\u73b0\uff08\u89c1\u84dd\u6846\uff09\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"699\" height=\"684\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-9.png\" alt=\"Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL - Algorithm 1 prompts\" class=\"wp-image-1071684\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-9.png 699w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-9-300x294.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-9-184x180.png 184w\" sizes=\"auto, (max-width: 699px) 100vw, 699px\" \/><figcaption class=\"wp-element-caption\">\u56fe5\uff1aPIN \u7b97\u6cd5\u6846\u56fe<\/figcaption><\/figure>\n\n\n\n<p>\u901a\u8fc7\u5728\u6587\u672c\u5206\u7c7b\u3001\u6587\u672c\u98ce\u683c\u8fc1\u79fb\u3001\u56fe\u7247\u6807\u6ce8\u7b49\u63d0\u793a\u8bcd\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u8be6\u7ec6\u5b9e\u9a8c\u3002PIN \u7b97\u6cd5\u4e0d\u4ec5\u5728\u8fd9\u4e9b\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u66f4\u597d\u7684\u6027\u80fd\uff0c\u800c\u4e14\u4f18\u5316\u6548\u7387\u66f4\u9ad8\uff0c\u80fd\u4ee5\u66f4\u5c11\u6b21\u6570\u7684\u8bed\u8a00\u6a21\u578b\u8c03\u7528\u4f18\u5316\u5f97\u5230\u66f4\u597d\u7684\u63d0\u793a\u8bcd\u7ec4\u5408\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2048\" height=\"1412\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-10-2048x1412-1.png\" alt=\"Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL - table and three charts\" class=\"wp-image-1071687\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-10-2048x1412-1.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-10-2048x1412-1-300x207.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-10-2048x1412-1-1024x706.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-10-2048x1412-1-768x530.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-10-2048x1412-1-1536x1059.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-10-2048x1412-1-240x165.png 240w\" sizes=\"auto, (max-width: 2048px) 100vw, 2048px\" \/><figcaption class=\"wp-element-caption\">\u56fe6\uff1a\uff08\u4e0a\uff09\u5728\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e0a\uff0cPIN \u751f\u6210\u7684\u63d0\u793a\u8bcd\u6027\u80fd\u4f18\u4e8e\u5176\u4ed6\u57fa\u7ebf\u65b9\u6cd5\uff1b\uff08\u4e0b\uff09\u5728\u56fe\u7247\u6807\u6ce8\u4efb\u52a1\u4e0a\uff0cPIN \u7684\u5f3a\u5316\u5b66\u4e60\u8fc7\u7a0b\u6548\u7387\u66f4\u9ad8\u5e76\u4e14\u6027\u80fd\u66f4\u597d\u3002<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"05-\u63d0\u9ad8\u5927\u8bed\u8a00\u6a21\u578b\u5728\u4e8b\u4ef6\u5173\u7cfb\u903b\u8f91\u9884\u6d4b\u4e2d\u7684\u8868\u73b0\">05. \u63d0\u9ad8\u5927\u8bed\u8a00\u6a21\u578b\u5728\u4e8b\u4ef6\u5173\u7cfb\u903b\u8f91\u9884\u6d4b\u4e2d\u7684\u8868\u73b0<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"229\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-11-768x229-1.png\" alt=\"Improving Large Language Models in Event Relation Logical Prediction\" class=\"wp-image-1071690\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-11-768x229-1.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-11-768x229-1-300x89.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-11-768x229-1-240x72.png 240w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/figure>\n\n\n\n<p>\u8bba\u6587\u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2310.09158\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/arxiv.org\/pdf\/2310.09158<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>\u5927\u8bed\u8a00\u6a21\u578b\u867d\u7136\u5728\u8bb8\u591a\u9886\u57df\u53d6\u5f97\u4e86\u7a81\u7834\u6027\u8fdb\u5c55\uff0c\u4f46\u5728\u5904\u7406\u590d\u6742\u4e8b\u4ef6\u5173\u7cfb\u903b\u8f91\u65f6\u4ecd\u5b58\u5728\u56f0\u96be\uff0c\u5e38\u8868\u73b0\u4e3a\u4e00\u81f4\u6027\u4e0d\u8db3\u6216\u63a8\u7406\u80fd\u529b\u6709\u9650\u3002\u5f53\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u73b0\u6709 LLMs \u5728\u9700\u8981\u4e25\u8c28\u63a8\u7406\u7684\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\uff0c\u903b\u8f91\u4e00\u81f4\u6027\u8f83\u5dee\u3002<\/p>\n\n\n\n<p>\u9488\u5bf9\u8fd9\u4e9b\u73b0\u8c61\uff0c\u6765\u81ea\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u7684\u7814\u7a76\u5458\u4eec\u63d0\u51fa\u4e86\u51e0\u79cd\u63d0\u5347 LLMs \u903b\u8f91\u63a8\u7406\u80fd\u529b\u7684\u7b56\u7565\uff08\u56fe7\uff09\uff0c\u5305\u62ec\uff1a\u751f\u6210\u5f0f\u65b9\u6cd5\uff0c\u5373\u901a\u8fc7\u5f15\u5165\u8fde\u8d2f\u7684\u903b\u8f91\u7ea6\u675f\u6307\u5bfc LLMs \u7684\u63a8\u7406\uff1b\u68c0\u7d22\u5f0f\u65b9\u6cd5\uff0c\u5373\u901a\u8fc7\u5224\u65ad\u6a21\u578b\u521d\u59cb\u7b54\u6848\u6765\u68c0\u7d22\u76f8\u5173\u903b\u8f91\u7ea6\u675f\u5e76\u6dfb\u52a0\u5230\u6a21\u578b\u6307\u4ee4\u4e2d\uff1b\u5fae\u8c03\u5f0f\u65b9\u6cd5\uff0c\u5373\u901a\u8fc7\u903b\u8f91\u63a8\u7406\u5f15\u64ce\u6765\u6784\u5efa\u9ad8\u9636\u4e8b\u4ef6\u5173\u7cfb\u903b\u8f91\u9884\u6d4b\u6570\u636e\u96c6\uff08LLM-ERL\uff09\uff0c\u5e76\u4ee5\u6b64\u5bf9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"458\" height=\"291\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-12.png\" alt=\"Improving Large Language Models in Event Relation Logical Prediction - prompt diagram\" class=\"wp-image-1071693\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-12.png 458w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-12-300x191.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-12-240x152.png 240w\" sizes=\"auto, (max-width: 458px) 100vw, 458px\" \/><figcaption class=\"wp-element-caption\">\u56fe7\uff1a\u901a\u8fc7\u4f7f\u7528\u751f\u6210\u3001\u68c0\u7d22\u548c\u5fae\u8c03\u65b9\u6cd5\uff0c\u5c06\u903b\u8f91\u7ea6\u675f\u7eb3\u5165 LLMs \u4e2d\u3002\u865a\u7ebf\u6846\u8868\u793a LLMs \u8f93\u51fa\u7684\u7b54\u6848\uff0c\u4e0b\u5212\u7ebf\u6587\u672c\u8868\u793a\u903b\u8f91\u7ea6\u675f\u3002<\/figcaption><\/figure>\n\n\n\n<p>\u7136\u540e\uff0c\u7814\u7a76\u5458\u4eec\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\u5e76\u53d1\u73b0\uff1a\u9996\u5148\uff0c\u5728\u9700\u8981\u6709\u4e25\u8c28\u903b\u8f91\u63a8\u7406\u7684\u4efb\u52a1\u4e0a\u76f4\u63a5\u4f7f\u7528 CoT \u4f1a\u53d7\u5230 LLMs \u56fa\u6709\u95ee\u9898\u7684\u9650\u5236\uff08\u5982\u5e7b\u89c9\u95ee\u9898\uff09\uff0c\u4f46\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7eb3\u5165\u903b\u8f91\u7ea6\u675f\u662f\u6709\u76ca\u7684\u3002\u5176\u6b21\uff0c\u68c0\u7d22\u5f0f\u65b9\u6cd5\u80fd\u663e\u8457\u51cf\u5c11 LLMs \u56de\u7b54\u4e2d\u7684\u4e0d\u4e00\u81f4\uff0c\u5176\u4e2d\u8f83\u5f3a\u7684\u6a21\u578b\u5982 GPT-4 \u53ef\u4ee5\u6709\u6548\u5730\u81ea\u884c\u8fdb\u884c\u68c0\u7d22\uff0c\u800c\u8f83\u5f31\u7684\u6a21\u578b\u5219\u9700\u8981\u8f85\u52a9\u7b5b\u9009\u76f8\u5173\u4fe1\u606f\u3002\u6700\u540e\uff0c\u5f53\u68c0\u7d22\u8fed\u4ee3\u6b21\u6570\u589e\u52a0\u65f6\uff0c\u968f\u7740\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684\u589e\u591a\uff0cLLMs \u53ef\u80fd\u4f1a\u51fa\u73b0\u300c\u8fc7\u5ea6\u601d\u8003\u300d\u73b0\u8c61\uff0c\u6700\u7ec8\u503e\u5411\u4e8e\u8f93\u51fa\u4fdd\u5b88\u7684\u3001\u6ca1\u6709\u903b\u8f91\u51b2\u7a81\u7684\u3001\u4f46\u4e5f\u6ca1\u6709\u4efb\u4f55\u8bed\u4e49\u7684\u7b54\u6848\uff08\u6bd4\u5982\uff0c\u5224\u65ad\u6240\u6709\u4e8b\u4ef6\u4e4b\u95f4\u90fd\u6ca1\u6709\u4efb\u4f55\u5173\u7cfb\uff09\u3002\u7814\u7a76\u5458\u4eec\u8fd8\u63d0\u51fa\uff0c\u5728\u8fdb\u884c few-shot in-context learning \u65f6\uff0c\u544a\u8bc9\u6a21\u578b\u201c\u662f\u4ec0\u4e48\u201d\uff08demonstrations\uff09 \u548c\u201c\u4e3a\u4ec0\u4e48\u201d\uff08logical constraints\uff09\u90fd\u53ca\u5176\u91cd\u8981\u3002<\/p>\n\n\n\n<p>\u8be5\u7814\u7a76\u6df1\u5165\u63a2\u8ba8\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u4e8b\u4ef6\u9884\u6d4b\u3001\u903b\u8f91\u63a8\u7406\u7b49\u95ee\u9898\u4e0a\u7684\u4e0d\u8db3\uff0c\u4e3a\u672a\u6765\u8bbe\u8ba1\u6709\u6548\u7684\u65b9\u6cd5\u4ee5\u53ca\u5982\u4f55\u5c06\u5927\u6a21\u578b\u5e94\u7528\u5230\u5b9e\u9645\u4efb\u52a1\u4e2d\u63d0\u4f9b\u4e86\u65b0\u7684\u601d\u8def\u548c\u89e3\u51b3\u65b9\u6cd5\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"06-e5-mistral-\u5927\u8bed\u8a00\u6a21\u578b\u589e\u5f3a\u7684\u6587\u672c\u5d4c\u5165\">06. E5-Mistral\uff1a\u5927\u8bed\u8a00\u6a21\u578b\u589e\u5f3a\u7684\u6587\u672c\u5d4c\u5165<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"214\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-13-768x214-1.png\" alt=\"Improving Text Embeddings with Large Language Models\" class=\"wp-image-1071696\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-13-768x214-1.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-13-768x214-1-300x84.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-13-768x214-1-240x67.png 240w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/figure>\n\n\n\n<p>\u8bba\u6587\u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2401.00368\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/arxiv.org\/pdf\/2401.00368<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>Github\u94fe\u63a5\uff1a<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/unilm\/tree\/master\/e5\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/github.com\/microsoft\/unilm\/tree\/master\/e5<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p>\u6587\u672c\u5d4c\u5165\u6a21\u578b\u5c06\u4e00\u6bb5\u8fde\u7eed\u7684\u6587\u672c\u6620\u5c04\u6210\u4f4e\u7ef4\u7684\u7a20\u5bc6\u5411\u91cf\uff0c\u662f\u641c\u7d22\u5f15\u64ce\u3001\u63a8\u8350\u7cfb\u7edf\u4e2d\u53ec\u56de\u6a21\u5757\u7684\u91cd\u8981\u7ec4\u4ef6\uff0c\u5bf9\u4e8e\u6700\u7ec8\u7684\u6392\u5e8f\u7ed3\u679c\u6709\u7740\u76f4\u63a5\u7684\u5f71\u54cd\u3002\u5176\u8fd8\u53ef\u4ee5\u5bf9\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u68c0\u7d22\u589e\u5f3a\uff08RAG\uff09\uff0c\u5e2e\u52a9\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u9636\u6bb5\u8bbf\u95ee\u6700\u65b0\u7684\u4fe1\u606f\u548c\u79c1\u6709\u77e5\u8bc6\u5e93\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6587\u672c\u5d4c\u5165\u5de5\u4f5c\u6240\u91c7\u7528\u7684\u57fa\u7840\u6a21\u578b\u6cdb\u5316\u80fd\u529b\u5f31\uff0c\u4e14\u8bad\u7ec3\u6570\u636e\u591a\u6837\u6027\u4e0d\u8db3\uff0c\u9650\u5236\u4e86\u5d4c\u5165\u6a21\u578b\u7684\u8d28\u91cf\u3002<\/p>\n\n\n\n<p>\u4e3a\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898\uff0c\u672c\u7bc7\u8bba\u6587\u4ece\u4e24\u4e2a\u65b9\u9762\u6316\u6398\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u5d4c\u5165\u65b9\u9762\u7684\u6f5c\u529b\u3002\u4e00\u65b9\u9762\uff0c\u9488\u5bf9\u73b0\u6709\u6807\u6ce8\u6570\u636e\u591a\u6837\u6027\u4e0d\u9ad8\u7684\u95ee\u9898\uff0c\u7814\u7a76\u5458\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u7684\u63d0\u793a\u7b56\u7565\uff0c\u901a\u8fc7 GPT-4 \u7b49\u5f3a\u5927\u7684\u8bed\u8a00\u6a21\u578b\u5408\u6210\u6570\u5341\u4e07\u79cd\u5d4c\u5165\u4efb\u52a1\u7684\u6570\u636e\u5e76\u8986\u76d693\u79cd\u8bed\u8a00\uff0c\u6781\u5927\u7f13\u89e3\u4e86\u8bb8\u591a\u957f\u5c3e\u4efb\u52a1\u7f3a\u4e4f\u8bad\u7ec3\u6570\u636e\u7684\u95ee\u9898\uff1b\u53e6\u4e00\u65b9\u9762\uff0c\u9488\u5bf9 BERT \u7b49\u5c0f\u7f16\u7801\u5668\u6a21\u578b\u51fa\u73b0\u7684\u6cdb\u5316\u80fd\u529b\u5f31\u7684\u95ee\u9898\uff0c\u7814\u7a76\u5458\u4eec\u91c7\u7528\u4e86 Mistral \u7b49\u7ecf\u8fc7\u5e7f\u6cdb\u9884\u8bad\u7ec3\u7684\u89e3\u7801\u5668\u6a21\u578b\u4f5c\u4e3a\u57fa\u5ea7\uff0c\u5b9e\u9a8c\u8868\u660e\uff0c\u53ea\u9700\u8981\u4e0d\u8d85\u8fc71k\u6b65\u68af\u5ea6\u66f4\u65b0\uff0c\u5c31\u53ef\u4ee5\u8fbe\u5230\u5f88\u597d\u7684\u6cdb\u5316\u6548\u679c\u3002\u540c\u65f6\uff0c\u4e0e\u4e3b\u6d41\u7684\u591a\u9636\u6bb5\u5bf9\u6bd4\u5b66\u4e60\u9884\u8bad\u7ec3\u76f8\u6bd4\uff0c\u6574\u4e2a\u8bad\u7ec3\u6d41\u7a0b\u4e5f\u4f1a\u5927\u5927\u7b80\u5316\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"959\" height=\"503\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-14.png\" alt=\"Improving Text Embeddings with Large Language Models - prompts\" class=\"wp-image-1071699\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-14.png 959w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-14-300x157.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-14-768x403.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/new-arrival-in-research-14-14-240x126.png 240w\" sizes=\"auto, (max-width: 959px) 100vw, 959px\" \/><figcaption class=\"wp-element-caption\"><em>\u56fe7\uff1a\u901a\u8fc7\u4f7f\u7528\u751f\u6210\u3001\u68c0\u7d22\u548c\u5fae\u8c03\u65b9\u6cd5\uff0c\u5c06\u903b\u8f91\u7ea6\u675f\u7eb3\u5165 LLMs \u4e2d\u3002\u865a\u7ebf\u6846\u8868\u793a LLMs \u8f93\u51fa\u7684\u7b54\u6848\uff0c\u4e0b\u5212\u7ebf\u6587\u672c\u8868\u793a\u903b\u8f91\u7ea6\u675f\u3002<\/em><\/figcaption><\/figure>\n\n\n\n<p>\u5728\u5d4c\u5165\u6a21\u578b\u5b9a\u5236\u5316\u65b9\u9762\uff0cE5-Mistral \u652f\u6301\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u6765\u63cf\u8ff0\u5f53\u524d\u7684\u5d4c\u5165\u4efb\u52a1\uff0c\u53ef\u4ee5\u5728\u4e0d\u66f4\u6539\u6a21\u578b\u53c2\u6570\u7684\u524d\u63d0\u4e0b\uff0c\u5b9a\u5236\u5316\u5d4c\u5165\u6a21\u578b\u7684\u884c\u4e3a\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5305\u542b56\u4e2a\u6570\u636e\u96c6\u7684 MTEB \u8bc4\u6d4b\u57fa\u51c6\u4e0a\uff0cE5-Mistral \u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u5f00\u6e90\u4ee5\u53ca\u5546\u4e1a\u95ed\u6e90\u7684\u6587\u672c\u5d4c\u5165\u6a21\u578b\uff0c\u5e76\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u591a\u8bed\u8a00\u548c\u957f\u6587\u672c\u6cdb\u5316\u80fd\u529b\u3002<\/p>\n\n\n\n<p>E5-Mistral \u7684\u5f00\u6e90\u6a21\u578b\u5df2\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\uff0c\u7d2f\u8ba1\u83b7\u5f97\u767e\u4e07\u4f59\u6b21\u4e0b\u8f7d\u91cf\u3002\u8be5\u7814\u7a76\u56e2\u961f\u5c06\u7ee7\u7eed\u63a2\u7d22\u5d4c\u5165\u6a21\u578b\u7684\u6269\u5c55\u548c\u5e94\u7528\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u7f16\u8005\u6309\uff1a\u6b22\u8fce\u9605\u8bfb\u201c\u79d1\u7814\u4e0a\u65b0\u201d\u680f\u76ee\uff01\u201c\u79d1\u7814\u4e0a\u65b0\u201d\u6c47\u805a\u4e86\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u6700\u65b0\u7684\u521b\u65b0\u6210\u679c\u4e0e\u79d1\u7814\u52a8\u6001\u3002\u5728\u8fd9\u91cc\uff0c\u4f60\u53ef\u4ee5\u5feb\u901f\u6d4f\u89c8\u7814\u7a76\u9662\u7684\u4eae\u70b9\u8d44\u8baf\uff0c\u4fdd\u6301\u5bf9\u524d\u6cbf\u9886\u57df\u7684\u654f\u9510\u55c5\u89c9\uff0c\u540c\u65f6\u4e5f\u80fd\u627e\u5230\u5148\u8fdb\u5b9e\u7528\u7684\u5f00\u6e90\u5de5\u5177\u3002 \u4e0b\u5468\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u56fd\u9645\u9876\u7ea7\u5b66\u672f\u4f1a\u8bae ACL 2024 \u5c06\u5728\u6cf0\u56fd\u66fc\u8c37\u4e3e\u529e\u3002\u672c\u5c4a\u5927\u4f1a\u4e0a\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u5171\u670914\u7bc7\u8bba\u6587\u5165\u9009\uff0c\u8fd9\u4e00\u671f\u7684\u201c\u79d1\u7814\u4e0a\u65b0\u201d\u680f\u76ee\u7cbe\u9009\u4e86\u5176\u4e2d\u7684\u516d\u7bc7\u4e3a\u5927\u5bb6\u8fdb\u884c\u7b80\u8981\u4ecb\u7ecd\u3002 \u8bba\u6587\u94fe\u63a5\uff1ahttps:\/\/arxiv.org\/pdf\/2403.05063 (opens in new tab) GitHub \u94fe\u63a5\uff1ahttps:\/\/github.com\/microsoft\/recai (opens in new tab) \u5728\u6570\u5b57\u5316\u65f6\u4ee3\uff0c\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\u867d\u4fbf\u6377\uff0c\u5374\u5e38\u663e\u88ab\u52a8\uff0c\u96be\u4ee5\u6ee1\u8db3\u7528\u6237\u65e5\u76ca\u589e\u957f\u7684\u4e2a\u6027\u5316\u3001\u4ea4\u4e92\u6027\u9700\u6c42\u3002\u5176\u53ef\u89e3\u91ca\u6027\u7684\u7f3a\u5931\u4e0e\u53ef\u63a7\u6027\u7684\u4e0d\u8db3\uff0c\u4e5f\u6210\u4e3a\u4e86\u7528\u6237\u4f53\u9a8c\u5347\u7ea7\u7684\u74f6\u9888\u3002\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5176\u5353\u8d8a\u7684\u8bed\u8a00\u7406\u89e3\u3001\u77e5\u8bc6\u50a8\u5907\u3001\u63a8\u7406\u548c\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u6709\u671b\u6210\u4e3a\u4e0b\u4e00\u4ee3\u63a8\u8350\u7cfb\u7edf\u7684\u65b0\u5f15\u64ce\u3002 \u53d7\u6b64\u542f\u53d1\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u548c\u6df1\u5733\u5927\u5b66\u7684\u7814\u7a76\u56e2\u961f\u5408\u4f5c\uff0c\u6253\u9020\u4e86\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u65b0\u4e00\u4ee3\u63a8\u8350\u7cfb\u7edf\u3002\u5b83\u7531\u5927\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\uff0c\u80fd\u66f4\u81ea\u7136\u5730\u7406\u89e3\u7528\u6237\u9700\u6c42\u7684\u52a8\u6001\u53d8\u5316\u5e76\u63d0\u4f9b\u66f4\u52a0\u7cbe\u51c6\u7684\u4e2a\u6027\u5316\u670d\u52a1\u3002\u5176\u4e2d\uff0c\u9488\u5bf9\u9886\u57df\u77e5\u8bc6\u548c\u9886\u57df\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u4e0e\u5bf9\u9f50\u6210\u4e3a\u5173\u952e\u3002\u7814\u7a76\u56e2\u961f\u4e3a\u6b64\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff1a\u76d1\u7763\u5b66\u4e60\uff08SL\uff09\u9636\u6bb5\u548c\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u9636\u6bb5\u3002 \u5728\u76d1\u7763\u5b66\u4e60\u9636\u6bb5\uff0c\u7814\u7a76\u56e2\u961f\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u6027\u4efb\u52a1\uff0c\u5982\u7269\u54c1\u4fe1\u606f\u95ee\u7b54\u3001\u7269\u54c1\u63a8\u8350\u3001\u7c7b\u522b\u63a7\u5236\u7b49\uff0c\u7528\u6765\u589e\u5f3a\u5927\u8bed\u8a00\u6a21\u578b\u5bf9\u65b0\u77e5\u8bc6\u7684\u5bfc\u5165\uff0c\u4ee5\u53ca\u63d0\u9ad8\u63a8\u8350\u76f8\u5173\u7684\u590d\u6742\u6307\u4ee4\u7684\u9075\u4ece\u80fd\u529b\u3002\u540c\u65f6\uff0c\u4f20\u7edf\u63a8\u8350\u6a21\u578b\uff08\u4f8b\u5982 SASRec\uff09\u8fd8\u4f5c\u4e3a\u6559\u5e08\u6a21\u578b\uff0c\u5e2e\u52a9\u751f\u6210\u76d1\u7763\u5b66\u4e60\u6240\u7528\u7684\u6807\u7b7e\uff0c\u6709\u6548\u89e3\u51b3\u4e86\u8bad\u7ec3\u6570\u636e\u7a00\u758f\u7684\u95ee\u9898\u3002\u5728\u5f3a\u5316\u5b66\u4e60\u9636\u6bb5\uff0c\u7814\u7a76\u56e2\u961f\u5e0c\u671b\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6a21\u578b\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u53bb\u54cd\u5e94\u5f00\u653e\u6027\u7684\u7528\u6237\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5956\u52b1\u673a\u5236\uff0c\u8ba9\u6a21\u578b\u4e0d\u65ad\u671d\u7740\u66f4\u4f18\u7684\u65b9\u5411\u8fed\u4ee3\uff0c\u4f7f\u5176\u80fd\u591f\u66f4\u7cbe\u51c6\u5730\u670d\u4ece\u7528\u6237\u6307\u4ee4\u610f\u56fe\uff0c\u5e76\u51cf\u5c11\u8f93\u51fa\u7684\u683c\u5f0f\u9519\u8bef\u3002 \u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u751f\u6210\u5f0f\u63a8\u8350\u7cfb\u7edf\u80fd\u591f\u5f88\u597d\u5730\u54cd\u5e94\u591a\u79cd\u7528\u6237\u7684\u63a8\u8350\u8bf7\u6c42\uff0c\u4e3a\u4ea4\u4e92\u5f0f\u667a\u80fd\u63a8\u8350\u7cfb\u7edf\u6253\u4e0b\u57fa\u7840\u3002\u7814\u7a76\u56e2\u961f\u5728\u751f\u6210\u5f0f\u63a8\u8350\u65b9\u5411\u4e0a\u7684\u7814\u7a76\u5de5\u4f5c\u4e5f\u5c06\u6301\u7eed\u5728 GitHub \u4e2d\u5f00\u6e90\u5206\u4eab\u3002 \u8bba\u6587\u94fe\u63a5\uff1ahttps:\/\/arxiv.org\/pdf\/2402.17168 (opens in new tab) GitHub \u94fe\u63a5\uff1ahttps:\/\/github.com\/MetaCopilot\/dseval (opens in new tab) \u6570\u636e\u79d1\u5b66\u901a\u8fc7\u5206\u6790\u5927\u91cf\u6570\u636e\u53ef\u4ee5\u5e2e\u52a9\u4e2a\u4eba\u548c\u7ec4\u7ec7\u505a\u51fa\u660e\u667a\u51b3\u7b56\u3001\u9884\u6d4b\u8d8b\u52bf\u548c\u6539\u8fdb\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u6570\u636e\u79d1\u5b66\u7684\u590d\u6742\u6027\u9700\u8981\u5e7f\u6cdb\u7684\u5206\u6790\u5de5\u5177\u548c\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u4e13\u5bb6\u4e5f\u6784\u6210\u4e86\u6311\u6218\u3002\u8fd1\u671f\uff0c\u5927\u8bed\u8a00\u6a21\u578b\u53ca\u5176\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u589e\u5f3a\u6570\u636e\u79d1\u5b66\u80fd\u529b\u65b9\u9762\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u7531\u4e8e LLMs \u7684\u9650\u5236\u3001\u4e0d\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u6216\u7f3a\u4e4f\u6545\u969c\u6062\u590d\u673a\u5236\uff0c\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u4ecd\u9762\u4e34\u53ef\u9760\u6027\u548c\u51c6\u786e\u6027\u7684\u95ee\u9898\uff08\u5982\u5ffd\u7565\u5217\u3001\u8bef\u89e3\u6570\u636e\u7c7b\u578b\u3001\u672a\u6309\u6307\u5b9a\u683c\u5f0f\u8f93\u51fa\u7ed3\u679c\u6216\u4fee\u6539\u539f\u59cb\u6570\u636e\uff09\u3002\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u5728\u8861\u91cf\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u80fd\u529b\u548c\u5c40\u9650\u6027\u65b9\u9762\u8fd8\u6709\u5f88\u5927\u7684\u8fdb\u6b65\u7a7a\u95f4\u3002 \u4e3a\u6b64\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u7684\u7814\u7a76\u5458\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u578b\u57fa\u51c6\u6846\u67b6 DSEval\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30 LLMs \u9a71\u52a8\u7684\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u3002\u5176\u901a\u8fc7\u5f15\u5165\u65b0\u7684\u6ce8\u91ca\u8fc7\u7a0b\u548c\u8bed\u8a00\uff08DSEAL\uff0cDSEval Annotation Language\uff09\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u51c6\u7684\u53ef\u6269\u5c55\u6027\u548c\u8986\u76d6\u8303\u56f4\u3002\u8be5\u6846\u67b6\u4e0d\u4ec5\u8986\u76d6\u4e86\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u7684\u6574\u4e2a\u751f\u547d\u5468\u671f\uff0c\u4ece\u63a5\u6536\u67e5\u8be2\u3001\u68c0\u7d22\u4e0a\u4e0b\u6587\u3001\u751f\u6210\u4ee3\u7801\u5230\u6267\u884c\u4ee3\u7801\u5e76\u8fd4\u56de\u7ed3\u679c\uff0c\u8fd8\u5305\u62ec\u4e00\u4e2a\u9a8c\u8bc1\u6a21\u5757\uff0c\u53ef\u4ee5\u6301\u7eed\u76d1\u63a7\u751f\u6210\u7684\u4ee3\u7801\u3001\u6267\u884c\u7684\u7ed3\u679c\u548c\u8fd0\u884c\u65f6\u4f1a\u8bdd\uff0c\u5e76\u4e0e\u53c2\u8003\u4ee3\u7801\u7247\u6bb5\u8fdb\u884c\u6bd4\u8f83\uff0c\u786e\u4fdd\u51c6\u786e\u6027\u3002 \u6b64\u5916\uff0cDSEAL \u8fd8\u88ab\u7528\u4e8e\u63cf\u8ff0\u548c\u914d\u7f6e\u95ee\u9898\u96c6\uff0c\u786e\u4fdd\u4e0e DSEval \u6846\u67b6\u517c\u5bb9\uff0c\u5e76\u6613\u4e8e\u7406\u89e3\u548c\u8c03\u8bd5\u3002\u95ee\u9898\u96c6\u7531\u7cfb\u7edf\u81ea\u52a8\u751f\u6210\u5e76\u7ecf\u4e13\u5bb6\u4fee\u8ba2\uff0c\u786e\u4fdd\u4e86\u5176\u591a\u6837\u6027\u548c\u51c6\u786e\u6027\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u7684\u5de5\u4f5c\u91cf\uff0c\u63d0\u9ad8\u4e86\u57fa\u51c6\u6d4b\u8bd5\u7684\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDSEval \u6846\u67b6\u5728\u8bc4\u4f30\u6570\u636e\u79d1\u5b66\u4ee3\u7406\u65b9\u9762\u8868\u73b0\u4f18\u5f02\u3002\u901a\u8fc7\u5bf9\u4e0d\u540c\u4ee3\u7406\u65b9\u6cd5\u7684\u6bd4\u8f83\uff0c\u5b9e\u9a8c\u8fd8\u53d1\u73b0\u4e0a\u4e0b\u6587\u63d0\u53d6\u65b9\u6cd5\u5bf9 LLMs \u6027\u80fd\u6709\u663e\u8457\u5f71\u54cd\u3002\u5e76\u4e14\uff0c\u901a\u8fc7\u591a\u8f6e\u81ea\u6211\u4fee\u590d\u5c1d\u8bd5\uff0c\u4f4e\u80fd\u529b\u6a21\u578b\uff08\u5982GPT-3.5\uff09\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\u7684\u8868\u73b0\u4f18\u4e8e\u9ad8\u80fd\u529b\u6a21\u578b\uff08\u5982GPT-4\uff09\uff0c\u5c55\u793a\u4e86\u81ea\u6211\u4fee\u590d\u65b9\u6cd5\u7684\u5de8\u5927\u6f5c\u529b\u3002 DSEval \u76ee\u524d\u5df2\u5f00\u6e90\u8be5\u6846\u67b6\u548c\u6570\u636e\u96c6\uff0c\u672a\u6765\u7814\u7a76\u5458\u4eec\u4e5f\u5c06\u7ee7\u7eed\u6df1\u5165\u63a2\u7d22\u8fd9\u4e00\u9886\u57df\u3002 \u8bba\u6587\u94fe\u63a5\uff1ahttps:\/\/arxiv.org\/pdf\/2402.10631 (opens [&hellip;]<\/p>\n","protected":false},"author":42735,"featured_media":1071783,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":1012650,"msr_hide_image_in_river":null,"footnotes":""},"research-area":[13556],"msr-locale":[268881],"msr-post-option":[],"class_list":["post-1071651","msr-blog-post","type-msr-blog-post","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-zh_cn"],"msr_assoc_parent":{"id":1012650,"type":"lab"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1071651","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42735"}],"version-history":[{"count":11,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1071651\/revisions"}],"predecessor-version":[{"id":1106151,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/1071651\/revisions\/1106151"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1071783"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1071651"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1071651"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1071651"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1071651"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}