{"id":326648,"date":"2016-11-23T14:01:47","date_gmt":"2016-11-23T22:01:47","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=326648"},"modified":"2021-05-09T12:03:48","modified_gmt":"2021-05-09T19:03:48","slug":"user-specific-training-vocal-melody-transcription","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/user-specific-training-vocal-melody-transcription\/","title":{"rendered":"User-Specific Training for Vocal Melody Transcription"},"content":{"rendered":"<p class=\"pdescription\"><b>Overview<\/b><br \/>\nThis page contains supplementary material for our AAAI 2010 paper: \u201cUser-Specific Learning for Recognizing a Singer\u2019s Intended Pitch\u201d. The full citation for our paper follows, along with a link to the paper itself:<\/p>\n<p class=\"pdescription\">Guillory A, Basu S, and Morris D. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/AAAI2010Guillory.pdf\">User-Specific Learning for Recognizing a Singer\u2019s Intended Pitch<\/a>. Proceedings of AAAI 2010, July 2010.<\/p>\n<p class=\"pdescription\">For more information about this work, contact Dan Morris (<a href=\"mailto:dan@microsoft.com\">dan@microsoft.com<\/a>) and Sumit Basu (<a href=\"mailto:sumitb@microsoft.com\">sumitb@microsoft.com<\/a>).<\/p>\n<p class=\"pdescription\"><b>Abstract<\/b><br \/>\nWe consider the problem of automatic vocal melody transcription: translating an audio recording of a sung melody into a musical score. While previous work has focused on finding the closest notes to the singer\u2019s tracked pitch, we instead seek to recover the melody the singer intended to sing. Often, the melody a singer intended to sing differs from what they actually sang; our hypothesis is that this occurs in a singer-specific way. For example, a given singer may often be flat in certain parts of her range, or another may have difficulty with certain intervals. We thus pursue methods for singer-specific training which use learning to combine different methods for pitch prediction. In our experiments with human subjects, we show that via a short training procedure we can learn a singer-specific pitch predictor and significantly improve transcription of intended pitch over other methods. For an average user, our method gives a 20 to 30 percent reduction in pitch classification errors with respect to a baseline method comparable to commercial voice transcription tools. For some users, we achieve even more dramatic reductions. Our best results come from a combination of singer-specificlearning with non-singer-specific feature selection. We are also making our experimental data available to allow others to replicate or extend our results, and we discuss the implications of our work for training more general control signals.<\/p>\n<p class=\"pdescription\"><b>Supplementary Material<\/b><br \/>\nThe primary purpose of this page is to host the data used in our experiments, which consist of:<\/p>\n<ol>\n<li class=\"pdescription\">A series of MIDI tracks, each a short melody or scale, several measures long<\/li>\n<li class=\"pdescription\">Recordings of 22 participants singing those melodies along with a drum beat, time-sync&#8217;d to the original MIDI tracks<\/li>\n<\/ol>\n<p class=\"pdescription\">We hope these recordings can serve as the beginning of a larger data repository, and as a benchmark data set for user-specific training or vocal melody transcription for environments with fixed tempos (an important feature of these recordings is that they were created by asking users to sing along with a drum beat).<\/p>\n<p class=\"pdescription\">Our complete experimental procedure is described in detail in our paper, and the instructions displayed to participants will be included at the end of this page. We note that only 22 recordings are included here, which is smaller than the total number collected: not all participants consented to having their recordings publicly released. However, the data set posted on this page is not systematically biased and is appropriate for testing alternate methods and understanding our experiments.<\/p>\n<p class=\"pdescription\">Our data archive can be downloaded as a single zipfile:<\/p>\n<p class=\"pdescription\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/rockicon.net\/pitchtracking\/public_pitch_data.zip\">public_pitch_data.zip<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (560MB)<\/p>\n<p class=\"pdescription\">The archive contains two directories:<\/p>\n<div>\n<p class=\"pdescription\"><b><i>input_data<\/i><\/b><\/p>\n<p class=\"pdescription\">This directory contains the ground truth files in both MIDI and audio format. Each file is numbered and named to correspond to the recordings for each participant (described below). In some cases files are labeled as \u201cmale\u201d or \u201cfemale\u201d; these represent the same melody in slightly different ranges to allow for reasonable reproduction by participants of both genders. Each example is included as a MIDI sequence (.mid), an \u201cexample\u201d sequence (.wma) (this is what participants heard before they were asked to sing back each example), and an \u201caccompaniment\u201d sequence (this is what participants heard while they were singing back each example: just a drum beat, an initial cue to set the key, and a count-in voiceover.<\/p>\n<p class=\"pdescription\">For example, melody 2 is represented in this directory as six files (all of which are live links on this page, as examples):<\/p>\n<p class=\"pdescription\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/rockicon.net\/pitchtracking\/02TwinkleFemale.accompaniment.wma\">02TwinkleFemale.accompaniment.wma<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/rockicon.net\/pitchtracking\/02TwinkleFemale.example.wma\">02TwinkleFemale.example.wma<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/rockicon.net\/pitchtracking\/02TwinkleFemale.mid\">02TwinkleFemale.mid<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/rockicon.net\/pitchtracking\/02TwinkleMale.accompaniment.wma\">02TwinkleMale.accompaniment.wma<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/rockicon.net\/pitchtracking\/02TwinkleMale.example.wma\">02TwinkleMale.example.wma<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/rockicon.net\/pitchtracking\/02TwinkleMale.mid\">02TwinkleMale.mid<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p class=\"pdescription\"><b><i>public<\/i><\/b><\/p>\n<p class=\"pdescription\">This directory contains recordings from 22 partipcants singing along with each of our 21 melodies. For example, the directory called \u201cP004\u201d contains participant 4\u2019s recordings of all 21 melodies, so files are named:<\/p>\n<p class=\"pdescription\">00Easy1.training.wav<br \/>\n01Easy2.training.wav<br \/>\n02TwinkleMale.training.wav<br \/>\n03Diddle2Male.training.wav<br \/>\n&#8230;<br \/>\n21OdeToJoyMale.training.wav<\/p>\n<\/div>\n<p class=\"pdescription\">The remainder of this page contains a brief video example of our data collection application, along with the instructions provided to participants, to demonstrate the process used for data collection. Please contact Dan Morris (<a href=\"mailto:dan@microsoft.com\">dan@microsoft.com<\/a>) and Sumit Basu (<a href=\"mailto:sumitb@microsoft.com\">sumitb@microsoft.com<\/a>) if you have questions about our procedure, are interested in implementing a competing method, or are interested in adding to our data repository!<\/p>\n<p><strong>Data Collection App Video Clip<\/strong><\/p>\n<div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"cbOt5Fl_Sbw\" data-poster=\"https:\/\/img.youtube.com\/vi\/cbOt5Fl_Sbw\/maxresdefault.jpg\"><iframe aria-hidden=\"true\" tabindex=\"-1\" title=\"User-Specific Pitch Tracking (data collection example)\" width=\"500\" height=\"375\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/cbOt5Fl_Sbw?feature=oembed&rel=0&enablejsapi=1\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div>\n<\/div>\n<div style=\"text-align: left\">\n<p><strong>Singing Experiment Instructions<\/strong><\/p>\n<p>This program will play and ask you to sing back a series of scales and melodies. We will use the data you record in our research into new pitch tracking methods which adapt to a singer&#8217;s voice. To participate, you will need a microphone. <b>If possible, use headphones while recording and use an external microphone.<\/b> If you don&#8217;t have a microphone or only have the microphone built into your computer, contact us and we can make alternate arrangements.As soon as you begin, you will be shown a sequence of notes in \u201cpiano roll\u201d format. This will be used as a rough visual cue for the audio you will hear. We refer to each sequence of notes as an example. At the start of each example, you will hear the root note of the scale or melody and a 4 beat countdown. After the countdown, the notes on screen will play accompanied by a simple drum beat. Below is the interface you will see.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-327491\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/PlaybackScreen-300x225.jpg\" alt=\"playbackscreen\" width=\"300\" height=\"225\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/PlaybackScreen.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/PlaybackScreen-80x60.jpg 80w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>At the bottom of the screen is a volume meter which shows the volume level from your microphone. Blowing into the microphone should cause this meter to fill up all the way. <b>When you\u2019re singing the meter should fill about 1\/4 to 3\/4 of the way.<\/b> Check the volume settings for your microphone if you don\u2019t see the meter move or if your recording level is too high or low.After the example has played, we will ask you to sing back what you just heard. The same root note, countdown, and drum beat will play back immediately, and we ask you begin singing along with the drum beat after the countdown. There are no lyrics to sing in any of the examples. Instead sing \u201cDoo\u201d for each note as you might singing backup in an a capella group. When you are recording, the volume meter will turn red.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-327494\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/RecordingScreen-300x225.jpg\" alt=\"recordingscreen\" width=\"300\" height=\"225\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/RecordingScreen.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/RecordingScreen-80x60.jpg 80w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>Recording will stop automatically, After you have finished recording, you will have the option to repeat the example again or move on the the next example. You will sing about 20 examples, and each example should take less than a minute to record. Optionally, after each example we will show you a visualization of the notes you sang and a score of how many notes you sang on pitch. Turn this option off if you find visualization is taking too much time.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-327497\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/ResultsScreen-300x225.jpg\" alt=\"resultsscreen\" width=\"300\" height=\"225\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/ResultsScreen.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/ResultsScreen-80x60.jpg 80w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>Try your best, but it\u2019s OK if many of the notes you sing are scored \u201cwrong\u201d. Automatic voice transcription is a difficult problem, and the algorithm we are using is unforgiving. This is in fact what we hope to improve using your data! When you\u2019re done, we\u2019ll create a zipfile of your data, and we\u2019ll automatically transfer that zipfile to our experimental database. If you have questions email us.<\/p>\n<p>When you are ready to begin, indicate your gender, enter your email address below, and click \u201cStart\u201d. Thank you for participating!<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Overview This page contains supplementary material for our AAAI 2010 paper: \u201cUser-Specific Learning for Recognizing a Singer\u2019s Intended Pitch\u201d. The full citation for our paper follows, along with a link to the paper itself: Guillory A, Basu S, and Morris D. User-Specific Learning for Recognizing a Singer\u2019s Intended Pitch. Proceedings of AAAI 2010, July 2010. [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,243062,13554],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-326648","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[318008],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"guest","display_name":"Andrew Guillory","user_id":434208,"people_section":"Group 1","alias":""},{"type":"user_nicename","display_name":"Sumit Basu","user_id":33754,"people_section":"Group 1","alias":"sumitb"}],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/326648","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/326648\/revisions"}],"predecessor-version":[{"id":744970,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/326648\/revisions\/744970"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=326648"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=326648"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=326648"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=326648"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=326648"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}