For Better or for Worse, Transformers Seek Patterns for Memorization
- Madhur Panwar ,
- Gail Weiss ,
- Navin Goyal ,
- Antoine Bosselut
Memorization in language models is a critical yet poorly understood phenomenon. In this work, we investigate memorization in transformer-based language models by analyzing their training dynamics over multiple epochs. We find that memorization is neither a constant accumulation of sequences nor simply dictated by the recency of exposure to these sequences. Instead, much like generalization, memorization appears to be driven by pattern recognition. Tracking memorization dynamics in mixed datasets, we observe that models memorize different sub-datasets in distinct bursts, suggesting that each subset is associated with unique underlying patterns, and that the model prefers to learn these patterns in a predictable order. While easily learnable patterns tend to support generalization on unseen data, more complex patterns do not. Furthermore, in datasets with weak or absent patterns, models may delay memorization while seeking them, a behavior we term \(overthinking\). Our results show that the subset of sequences memorized by a model over time is not arbitrary, and give insights into the internal processes a model goes through during training.