The authors propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset, to address the data scarcity problem in the military domain. CMNEE contains 17,000 documents and 29,223 manually annotated events based on a pre-defined schema for the military domain, including 8 event types and 11 argument role types.
The authors designed a two-stage, multi-turn annotation strategy to ensure the quality of CMNEE. They also reproduced several state-of-the-art event extraction models with a systematic evaluation, and the results demonstrate that event extraction for the military domain poses unique challenges and requires further research efforts.
CMNEE is the first publicly available dataset for document-level event extraction in the military domain. The authors analyze various aspects of CMNEE, including the event type distribution, multi-event distribution, event argument analysis, and the performance of baseline models. The results show that CMNEE has a high proportion of overlapping events and long arguments, which increases the difficulty of extraction.
The authors also discuss the limitations of CMNEE, such as the limited event types and roles, and the choice of language and annotation methodology. They suggest that expanding CMNEE to other languages and exploring new annotation techniques are potential future directions.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania