Constructing a Comprehensive Bilingual Information Extraction Corpus: IEPILE
IEPILE is a comprehensive bilingual (English and Chinese) information extraction instruction corpus containing approximately 0.32B tokens, constructed by collecting and cleaning 33 existing datasets and introducing a schema-based instruction generation strategy to address the limitations of existing IE datasets.