Pre-training transforms forgetful language models into retentive ones, influenced by knowledge relevance and diversification.