Constrained K-Means Clustering

  • K.P. Bennett
  • P.S. Bradley
  • A. Demiriz

MSR-TR-2000-65 |

We consider practical methods for adding constraints to the K-Means Clustering algorithm in order to avoid local solutions with empty clusters or clusters having very few points. We often observe this phenomena when applying K-Means to datasets where the number of dimensions is n > 10 and the number of desired clusters is k > 20. We propose explicitly adding k constraints to the underlying clustering optimization problem requiring that each cluster have at least a minimum number of points in it. We then investigate the resulting cluster assignment step. Preliminary numerical tests on real datasets indicate the the constrained approach is less prone to poor local solutions, producing a better summary of the underlying data.