Music transcription has many uses ranging from music information retrieval to better education tools. An important component of automated transcription is the identification and labeling of different kinds of vocal effects such as vibrato, glides, and riffs. In Indian Classical Music such effects are particularly important since a raga is often established and identified by the correct use of these effects. It is not only important to automatically classify what the effect is by processing audio recordings, but also to identify when the effect starts and ends in a vocal rendition. Some examples of such effects that are key to Indian music are Meend (vocal glides) and Andolan (very slow vibrato).

In this paper, we present an algorithm for the automatic transcription and effect identification of vocal renditions with specific application to North Indian Classical Music. Using expert human annotation as the ground truth, we evaluate this algorithm and compare it with two machine learning approaches. Our results show that we correctly identify the effects and transcribe vocal music with 85% accuracy. As a part of this effort, we have created a corpus of 35 voice recordings from 6 vocalists of varying levels of expertise, of which 12 recordings are manually annotated by experts. We intend to make this corpus available publicly.