Challenges Of The Email Domain For Text Classification
- J. Brutlag ,
- Chris Meek
Proceedings of the Seventeenth International Conference on Machine Learning |
Interactive classification of email into a user defined hierarchy of folders is a natural domain for application of text classification methods. This domain presents several challenges. First, the user’s changing mail filing habits mandate classification technology adapt in a dynamic environment. Second, the classification technology needs to be able to handle heterogeneity in folder content and folder size. Performance when there are only a small number of messages in a folder is especially important. Third, methods must meet the processing and memory requirements of a software implementation. We study three promising methods and present an analysis of their behavior with respect to these domain-specifc challenges.