Abstract

The Bleek and Lloyd collection contains 19th century handwritten notebooks that document the language and culture of the |Xam-speaking people who lived in Southern Africa. Access to this rich data could be enhanced by transcriptions of the text; however, the complex diacritics used in the notebooks complicate the process of transcription. Machine learning techniques could be used to perform this transcription, but it is not known which techniques would produce the best results. This paper thus reports on a comparison of 3 popular techniques applied to this problem: artificial neural networks (ANN); hidden Markov models (HMM); and support vector machines (SVM). It was found that an SVM-based classifier using histograms of oriented gradients as features resulted in the best word recognition accuracy of 58.4%. Furthermore, it was found that most feature extraction parameters did not have a large effect on recognition accuracy and that the SVM-based recognisers outperform both ANN- and HMM-based recognisers.