{"id":182722,"date":"2008-01-11T00:00:00","date_gmt":"2009-10-31T09:57:07","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/design-and-development-of-a-content-based-music-search-engine\/"},"modified":"2016-09-09T09:59:49","modified_gmt":"2016-09-09T16:59:49","slug":"design-and-development-of-a-content-based-music-search-engine","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/design-and-development-of-a-content-based-music-search-engine\/","title":{"rendered":"Design and development of a content-based music search engine"},"content":{"rendered":"<div class=\"asset-content\">\n<p>If you go to Amazon.com or the Apple Itunes store, your ability to search for<br \/>\nnew music will largely be limited by the `query-by-metadata&#8217;<br \/>\nparadigm: search by song, artist or album name. However, when we talk<br \/>\nor write about music, we use a rich vocabulary of semantic concepts<br \/>\nto convey our listening experience. If we can model a relationship<br \/>\nbetween these concepts and the audio content, then we can produce a<br \/>\nmore flexible music search engine based on a &#8216;query-by-semantic-<br \/>\ndescription&#8217; paradigm.<\/p>\n<p>In this talk, I will present a computer audition system that can both<br \/>\nannotate novel audio tracks with semantically meaningful words and<br \/>\nretrieve relevant tracks from a database of unlabeled audio content<br \/>\ngiven a text-base query. I consider the related tasks of content-<br \/>\nbased audio annotation and retrieval as one supervised multi-class,<br \/>\nmulti-label problem in which we model the joint probability of<br \/>\nacoustic features and words. For each word in a vocabulary, we use an<br \/>\nannotated corpus of songs to train a Gaussian mixture model (GMM)<br \/>\nover an audio feature space. We estimate the parameters of the model<br \/>\nusing the weighted mixture hierarchies Expectation Maximization<br \/>\nalgorithm. This algorithm is more scalable to large data sets and<br \/>\nproduces better density estimates than standard parameter estimation<br \/>\ntechniques. The quality of the music annotations produced by our<br \/>\nsystem is comparable with the performance of humans on the same task.<br \/>\nOur `query-by-semantic-description&#8217; system can retrieve appropriate<br \/>\nsongs for a large number of musically relevant words. I also show<br \/>\nthat our audition system is general by learning a model that can<br \/>\nannotate and retrieve sound effects. <\/p>\n<p>Lastly, I will discuss three techniques for collecting the semantic<br \/>\nannotations of music that are needed to train such a computer<br \/>\naudition system. They include text-mining web documents, conducting<br \/>\nsurveys, and deploying human computation games.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you go to Amazon.com or the Apple Itunes store, your ability to search for new music will largely be limited by the `query-by-metadata&#8217; paradigm: search by song, artist or album name. However, when we talk or write about music, we use a rich vocabulary of semantic concepts to convey our listening experience. If we [&hellip;]<\/p>\n","protected":false},"featured_media":194741,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-video-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-session-type":[],"msr-impact-theme":[],"msr-pillar":[],"msr-episode":[],"msr-research-theme":[],"class_list":["post-182722","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/v8piP5zcFsc","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/182722","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/182722\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/194741"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=182722"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=182722"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=182722"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=182722"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=182722"},{"taxonomy":"msr-session-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-session-type?post=182722"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=182722"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=182722"},{"taxonomy":"msr-episode","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-episode?post=182722"},{"taxonomy":"msr-research-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-theme?post=182722"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}