The paper proposes a novel method called IMO (Invariant features Masks for Out-of-Distribution text classification) to achieve out-of-distribution (OOD) generalization for text classification tasks. The key idea is to learn sparse domain-invariant representations from pre-trained transformer-based language models in a greedy layer-wise manner.
During training, IMO learns sparse mask layers to remove irrelevant features for prediction, where the remaining features are invariant across domains. Additionally, IMO employs a token-level attention mechanism to focus on the tokens that are most useful for prediction.
The authors provide a theoretical analysis to elucidate the relationship between domain-invariant features and causal features, and explain how IMO learns the invariant features.
The comprehensive experiments show that IMO significantly outperforms strong baselines, including prompt-based methods and large language models, on various evaluation metrics and settings for both binary sentiment analysis and multi-class classification tasks. IMO also demonstrates better performance when the size of the training data is limited, indicating its effectiveness in low-resource scenarios.
The authors also conduct ablation studies to justify the effectiveness of the top-down greedy search strategy and the individual components of IMO, such as the mask layers and attention mechanism.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Tao Feng,Liz... klokken arxiv.org 04-23-2024
https://arxiv.org/pdf/2404.13504.pdfDypere Spørsmål