Why do transformers have outliers?

Modern Machine Learning models are trained with a large number of parameters, often too large, and this overparameterization is very useful during training as it creates a vast search space for the model to encode rich representations from data...