Challenges Of The Email Domain For Text Classification

  • J. Brutlag ,
  • Chris Meek

Proceedings of the Seventeenth International Conference on Machine Learning |

Interactive classification of email into a user defined hierarchy of folders is a natural domain for application of text classification methods. This domain presents several challenges. First, the user’s changing mail filing habits mandate classification technology adapt in a dynamic environment. Second, the classification technology needs to be able to handle heterogeneity in folder content and folder size. Performance when there are only a small number of messages in a folder is especially important. Third, methods must meet the processing and memory requirements of a software implementation. We study three promising methods and present an analysis of their behavior with respect to these domain-specifc challenges.