Efficient Training of Language Models through Text Quality-Based Pruning
A novel method for numerically evaluating text quality in large unlabelled NLP datasets to identify and eliminate low-quality text instances, leading to improved training efficiency for Language Models.