Design and development of a content-based music search engine

If you go to or the Apple Itunes store, your ability to search for
new music will largely be limited by the `query-by-metadata’
paradigm: search by song, artist or album name. However, when we talk
or write about music, we use a rich vocabulary of semantic concepts
to convey our listening experience. If we can model a relationship
between these concepts and the audio content, then we can produce a
more flexible music search engine based on a ‘query-by-semantic-
description’ paradigm.

In this talk, I will present a computer audition system that can both
annotate novel audio tracks with semantically meaningful words and
retrieve relevant tracks from a database of unlabeled audio content
given a text-base query. I consider the related tasks of content-
based audio annotation and retrieval as one supervised multi-class,
multi-label problem in which we model the joint probability of
acoustic features and words. For each word in a vocabulary, we use an
annotated corpus of songs to train a Gaussian mixture model (GMM)
over an audio feature space. We estimate the parameters of the model
using the weighted mixture hierarchies Expectation Maximization
algorithm. This algorithm is more scalable to large data sets and
produces better density estimates than standard parameter estimation
techniques. The quality of the music annotations produced by our
system is comparable with the performance of humans on the same task.
Our `query-by-semantic-description’ system can retrieve appropriate
songs for a large number of musically relevant words. I also show
that our audition system is general by learning a model that can
annotate and retrieve sound effects.

Lastly, I will discuss three techniques for collecting the semantic
annotations of music that are needed to train such a computer
audition system. They include text-mining web documents, conducting
surveys, and deploying human computation games.

Speaker Details

Doug is currently a 6th year graduate student and NSF IGERT fellow at UC San Diego. His research focuses on computer audition, machine learning and music information retrieval. During his undergraduate studies at Princeton University, he worked with Perry Cook and George Tzanetakis on a music analysis software package called MARSYAS. Recently, he studied in Japan as an NSF EAPSI fellow working with Masataka Goto and Elias Pampalk on music segmentation. While at UCSD, he co-founded the Computer Audition Lab with Gert Lanckriet, Lawrence Saul and Luke Barrington.

Doug Turnbull
USC San Diego