toplogo
Sign In

Small Models Outperform Large Language Models in Cross-Domain Argument Extraction


Core Concepts
Small models trained on appropriate source ontologies can outperform large language models like GPT-3.5 and GPT-4 in zero-shot cross-domain argument extraction.
Abstract
This paper investigates the effectiveness of two prominent methods for event argument extraction (EAE) transfer learning - question answering (QA) and template infilling (TI) - across six datasets. The key findings are: Small Flan-T5 models trained on QA or TI can outperform massive language models like GPT-3.5 and GPT-4 in zero-shot cross-domain extraction, indicating that small models trained on recasted existing resources remain an effective first choice for extraction in new domains. Neither QA nor TI consistently outperforms the other across domains, suggesting both methods should be considered when attempting extraction on novel datasets and/or ontologies. The overlap in event types that are hard/easy in-domain vs. when transferring varies greatly across datasets, with some (e.g. WikiEvents) showing little correlation. Augmenting training data with paraphrases of the QA questions and TI templates can provide modest gains in transfer performance, depending on the specific dataset. The authors conclude that for EAE, small models trained on recasted existing resources remain an effective first choice for extraction in new domains, far from being obsolete in the face of large language models.
Stats
The authors report the following key statistics: "We use six datasets, each with its own ontology. All but ACE are document-level, i.e., arguments may appear in sentences other than the one containing their trigger. Appendix A has summary statistics."
Quotes
"Effective ontology transfer has been a major goal of recent work on event argument extraction (EAE). Two methods in particular—question answering (QA) and template infilling (TI)—have emerged as promising approaches to this problem." "We find that for each target ontology, some Flan-T5 model (TI or QA) obtains zero-shot performance superior to that of GPT-3.5—often by wide margins. Remarkably, the same is also true w.r.t. GPT-4, with the lone exception of FAMuS." "We observe often sizable performance gaps between TI and QA models trained on the same source dataset, both in the in-domain evaluations and and in the zero-shot evaluations."

Key Insights Distilled From

by William Gant... at arxiv.org 04-15-2024

https://arxiv.org/pdf/2404.08579.pdf
Small Models Are (Still) Effective Cross-Domain Argument Extractors

Deeper Inquiries

What are some potential reasons why the overlap in hard/easy event types for in-domain vs. zero-shot transfer varies so greatly across datasets?

The variation in the overlap of hard/easy event types between in-domain and zero-shot transfer across datasets can be attributed to several factors. One reason could be the diversity and complexity of the event types and roles in each ontology. Datasets with more nuanced and diverse event types may exhibit greater differences in difficulty between in-domain and zero-shot transfer. Additionally, the similarity between the source and target ontologies plays a crucial role. Datasets with more similar ontologies are likely to have higher overlap in hard/easy event types, while datasets with more distinct ontologies may show lower overlap.

How might the findings of this study differ if the full event extraction task (including trigger detection) was considered, rather than just argument extraction?

Considering the full event extraction task, including trigger detection, could potentially impact the findings of this study. Trigger detection adds another layer of complexity and variability to the task, as accurately identifying event triggers is crucial for extracting relevant arguments. The performance of models in trigger detection may influence their overall effectiveness in event extraction. Additionally, the interplay between trigger detection and argument extraction could lead to different patterns of transfer learning performance, as the two tasks are interconnected.

Could the insights from this work on small model effectiveness be extended to other structured prediction tasks beyond event extraction?

The insights from this study on the effectiveness of small models for event extraction could potentially be extended to other structured prediction tasks. Tasks that involve extracting structured information from text, such as named entity recognition, relation extraction, and semantic role labeling, may benefit from similar approaches. By training smaller models on specific source ontologies and leveraging techniques like question answering and template infilling, it may be possible to achieve superior zero-shot performance compared to larger pre-trained models. However, the applicability of these insights to other tasks would depend on the nature of the task and the complexity of the structured information being extracted.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star