{"id":969852,"date":"2023-09-22T11:28:19","date_gmt":"2023-09-22T18:28:19","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-video&#038;p=969852"},"modified":"2023-11-14T11:52:28","modified_gmt":"2023-11-14T19:52:28","slug":"final-intern-talk-improving-frechet-audio-distance-for-generative-music-evaluation","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/final-intern-talk-improving-frechet-audio-distance-for-generative-music-evaluation\/","title":{"rendered":"Final intern talk: Improving Frechet Audio Distance for Generative Music Evaluation"},"content":{"rendered":"<p>As generative music models become more powerful and popular, there is a growing need for robust objective metrics of music quality that correlates with human perception. The Frechet Audio Distance (FAD) is a commonly used metric for this purpose. However, its performance may be hampered by issues including sample size bias, limitations of the underlying audio embeddings, and the use of low-quality reference sets. We propose reducing sample size bias by extrapolating unbiased scores as the sample size approaches infinity. A comparison of various audio embeddings reveals that some are better suited for deriving FAD scores that capture aspects of musical or acoustic quality. Finally, our experiments underscore the importance of choosing a diverse and high-quality reference dataset for FAD calculation. Listening test results indicate that unbiased FAD scores calculated using suitable embeddings and reference music improves correlation with human ratings of musical and acoustic quality.<\/p>\n<p>Paper: <a class=\"fui-Link ___10kug0w f3rmtva f1ewtqcl fyind8e f1k6fduh f1w7gpdv fk6fouc fjoy568 figsok6 f1hu3pq6 f11qmguv f19f4twv f1tyq0we f1g0x7ka fhxju0i f1qch9an f1cnd47f fqv5qza f1vmzxwi f1o700av f13mvf36 f1cmlufx f9n3di6 f1ids18y f1tx3yz7 f1deo86v f1eh06m1 f1iescvh fhgqx19 f1olyrje f1p93eir f1nev41a f1h8hb77 f1lqvz6u f10aw75t fsle3fq f17ae5zn\" title=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3a%2f%2farxiv.org%2fabs%2f2311.01616&data=05%7c01%7ccratus%40microsoft.com%7cd74f82810a7f42b4bba908dbdfe18420%7c72f988bf86f141af91ab2d7cd011db47%7c1%7c0%7c638349931905255423%7cunknown%7ctwfpbgzsb3d8eyjwijoimc4wljawmdailcjqijoiv2lumziilcjbtii6ik1hawwilcjxvci6mn0%3d%7c3000%7c%7c%7c&sdata=qbyij%2btkuglhroqo4riurhbp7vgsephqcrb5qqyqz8a%3d&reserved=0\" href=\"https:\/\/arxiv.org\/abs\/2311.01616\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Link https:\/\/arxiv.org\/abs\/2311.01616\">https:\/\/arxiv.org\/abs\/2311.01616<\/a><br \/>\nCode: <a class=\"fui-Link ___10kug0w f3rmtva f1ewtqcl fyind8e f1k6fduh f1w7gpdv fk6fouc fjoy568 figsok6 f1hu3pq6 f11qmguv f19f4twv f1tyq0we f1g0x7ka fhxju0i f1qch9an f1cnd47f fqv5qza f1vmzxwi f1o700av f13mvf36 f1cmlufx f9n3di6 f1ids18y f1tx3yz7 f1deo86v f1eh06m1 f1iescvh fhgqx19 f1olyrje f1p93eir f1nev41a f1h8hb77 f1lqvz6u f10aw75t fsle3fq f17ae5zn\" title=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3a%2f%2fgithub.com%2fmicrosoft%2ffadtk&data=05%7c01%7ccratus%40microsoft.com%7cd74f82810a7f42b4bba908dbdfe18420%7c72f988bf86f141af91ab2d7cd011db47%7c1%7c0%7c638349931905259919%7cunknown%7ctwfpbgzsb3d8eyjwijoimc4wljawmdailcjqijoiv2lumziilcjbtii6ik1hawwilcjxvci6mn0%3d%7c3000%7c%7c%7c&sdata=voh11ijkczxhlzs3iilkegy807yvqilugykzlra7x4g%3d&reserved=0\" href=\"https:\/\/github.com\/microsoft\/fadtk\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Link https:\/\/github.com\/microsoft\/fadtk\">https:\/\/github.com\/microsoft\/fadtk<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As generative music models become more powerful and popular, there is a growing need for robust objective metrics of music quality that correlates with human perception. The Frechet Audio Distance (FAD) is a commonly used metric for this purpose. However, its performance may be hampered by issues including sample size bias, limitations of the underlying [&hellip;]<\/p>\n","protected":false},"featured_media":969855,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"research-area":[243062],"msr-video-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-session-type":[],"msr-impact-theme":[],"msr-pillar":[],"msr-episode":[],"msr-research-theme":[],"class_list":["post-969852","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-audio-acoustics","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/7Z4bIQHvW5w","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/969852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/969852\/revisions"}],"predecessor-version":[{"id":984420,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/969852\/revisions\/984420"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/969855"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=969852"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=969852"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=969852"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=969852"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=969852"},{"taxonomy":"msr-session-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-session-type?post=969852"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=969852"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=969852"},{"taxonomy":"msr-episode","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-episode?post=969852"},{"taxonomy":"msr-research-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-theme?post=969852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}