{"id":1014651,"date":"2024-03-14T01:13:44","date_gmt":"2024-03-14T08:13:44","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=1014651"},"modified":"2024-05-26T22:11:27","modified_gmt":"2024-05-27T05:11:27","slug":"benchmarking-large-language-models-across-languages-modalities-models-and-tasks","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/benchmarking-large-language-models-across-languages-modalities-models-and-tasks\/","title":{"rendered":"Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks"},"content":{"rendered":"<p>Recently, there has been a surge in LLM evaluation research to comprehend LLM capabilities and limitations. However, much of this research has been confined to English, leaving LLM building and evaluation for non-English languages relatively unexplored. Several new LLMs have been introduced recently, necessitating their evaluation on non-English languages. This study aims to perform a thorough evaluation of the non-English capabilities of state-of-the-art LLMs (GPT-3.5-Turbo, GPT-4, PaLM2, Mistral, and Llama2) by comparing them on the same set of multilingual datasets. Our benchmark comprises datasets covering languages, including low-resource African languages. We also include two multimodal datasets in the benchmark and compare the performance of LLaVA-v1.5 and GPT-4-Vision. Our experiments show that GPT-4 and PaLM2 outperform the Llama and Mistral models on various tasks, notably on low-resource languages, with GPT-4 outperforming PaLM2 on more datasets. However, issues such as data contamination must be addressed to obtain an accurate assessment of LLM performance on non-English languages.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, there has been a surge in LLM evaluation research to comprehend LLM capabilities and limitations. However, much of this research has been confined to English, leaving LLM building and evaluation for non-English languages relatively unexplored. Several new LLMs have been introduced recently, necessitating their evaluation on non-English languages. This study aims to perform a [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"NAACL 2024","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"North American Chapter of the Association for Computational Linguistics","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2024-3-13","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/2024.naacl.org\/","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[267222,246808],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1014651","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-field-of-study-large-language-model","msr-field-of-study-natural-language-processing"],"msr_publishername":"NAACL 2024","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-3-13","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2311.07463","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Sanchit Ahuja","user_id":0,"rest_url":false},{"type":"text","value":"Divyanshu Aggarwal","user_id":0,"rest_url":false},{"type":"text","value":"Varun Gumma","user_id":0,"rest_url":false},{"type":"text","value":"Ishaan Watts","user_id":0,"rest_url":false},{"type":"text","value":"Ashutosh Sathe","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Millicent Ochieng","user_id":40678,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Millicent Ochieng"},{"type":"text","value":"Rishav Hada","user_id":0,"rest_url":false},{"type":"text","value":"Prachi Jain","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Maxamed Axmed","user_id":40888,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Maxamed Axmed"},{"type":"user_nicename","value":"Kalika Bali","user_id":32477,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kalika Bali"},{"type":"user_nicename","value":"Sunayana Sitaram","user_id":36060,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sunayana Sitaram"}],"msr_impact_theme":[],"msr_research_lab":[199561,1021599],"msr_event":[],"msr_group":[],"msr_project":[1068003],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":1068003,"post_title":"African Languages Research","post_name":"african-languages-research","post_type":"msr-project","post_date":"2024-08-14 06:39:23","post_modified":"2024-08-14 09:29:33","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/african-languages-research\/","post_excerpt":"Bridging technological and community divides by enriching Generative AI research with the rich diversity of African languages. Join us in shaping a more inclusive digital future Microsoft Research Africa is pioneering advancements in Generative AI through our African Languages Research initiative, focusing on advancing Large Language Models (LLMs) for African languages. Our goal is to enhance language accessibility and ensure AI tools can effectively engage with Africa's linguistic diversity, supporting both language preservation and the&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1068003"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1014651","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1014651\/revisions"}],"predecessor-version":[{"id":1014654,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1014651\/revisions\/1014654"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1014651"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1014651"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1014651"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1014651"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1014651"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1014651"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1014651"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1014651"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1014651"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1014651"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1014651"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1014651"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1014651"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}