Core Concepts
Multi-Task Learning (MTL) can leverage useful information from related tasks to achieve simultaneous performance improvement on these tasks, helping to address overfitting and data scarcity problems in Natural Language Processing (NLP).
Abstract
This paper provides an overview of the use of MTL in NLP tasks. It first reviews different MTL architectures used in NLP, including parallel, hierarchical, modular, and generative adversarial architectures. The paper then presents optimization techniques for training MTL models, such as loss construction, gradient regularization, data sampling, and task scheduling.
The paper then discusses applications of MTL in a variety of NLP tasks, categorized into auxiliary MTL (where auxiliary tasks are introduced to improve the performance of primary tasks) and joint MTL (where multiple tasks are equally important). Some benchmark datasets used in MTL for NLP are also introduced.
Finally, the paper concludes with a discussion of possible research directions in this field, highlighting the instrumental role of multi-task learning in building strong models for natural language processing.
Stats
"In recent years, data-driven neural models have achieved great success in machine learning problems."
"MTL naturally aggregates training samples from datasets of multiple tasks and alleviates the data scarcity problem."
"Through implicit knowledge sharing during the training process, MTL models could match or even exceed the performance of their single-task counterparts using much less training samples."
Quotes
"Learning from multiple tasks makes it possible for models to capture generalized and complementary knowledge from the tasks at hand besides task-specific features."
"MTL provides additional performance gain compared to data augmentation approaches, due to its ability to learn common knowledge shared by different tasks."
"Contemporary LLMs set new state-of-the-art on a variety of tasks and demonstrate an impressive ability in adapting to new tasks under few-shot and zero-shot settings, highlighting the instrumental role of multi-task learning in building strong models for natural language processing."