Frameworks for Distributed Machine Learning


February 27, 2014


This talk is in three parts. The first deals with an aspect of the Weka project that has received little attention, namely the use of machine learning in agricultural applications. I will outline our experiences in this field and present an application development framework which is a direct result of this activity. In particular, one project has met one of the challenges proposed by Kiri Wagstaff at ICML 2012. Second, I will talk about our work in data stream mining with a focus on classification within the Massive Online Analysis framework MOA. After a quick overview of what is in MOA I will present two recent results that indicate a need for caution and a statement of what constitutes state-of-the-art in data stream classification for practitioners. I will also discuss attempts to produce a distributed version of MOA called SAMOA – a platform for data stream mining in a cluster/cloud environment. It features an architecture that allows it to run on several distributed stream processing engines such as S4 and Storm. Finally, I will present the idea of experiment databases, a framework for machine learning experimentation that saves effort and offers opportunities for meta learning and hypothesis generation.