Data Clustering for Developers


October 16, 2013


James McCaffrey


MSR Redmond


Data clustering is process of grouping data items together so that similar items belong to the same group. Although data clustering has been studied by researchers for decades, there is relatively little practical information available that describes how to actually implement clustering algorithms. This talk explains data clustering from a developer’s point of view with an emphasis on how to code clustering methods using the C# programming language. Topics covered include: clustering numeric data, clustering categorical data, key data structures, and determining the optimal number of clusters. Complete C# clustering source code will be presented that can be used as-is, or modified to meet special clustering scenarios.


James McCaffrey

James McCaffrey is an RSDE in the Advanced Development Team of MS Research in Redmond, WA. James holds degrees from the University of California at Irvine, California State University at Fullerton, Hawaii Pacific University, and a doctorate from the University of Southern California. James is also the Senior Contributing Editor for Microsoft’s MSDN Magazine, Microsoft’s technical journal for the software development community.