Core Concepts
Financial entities and service providers' relationships are analyzed through a German dataset for named entity recognition and relation extraction.
Abstract
1. Abstract:
Cyber mapping provides insights into financial entity relationships.
Dataset designed for named entity recognition and relation extraction.
5,969 annotations for four entity types and 4,102 relation annotations.
2. Introduction:
Cyber incidents pose risks to financial stability due to outsourcing processes.
Concept of "cyber mapping" links financial network with cyber network.
Fund prospectuses provide information on outsourced services in Germany.
3. Corpus Creation:
Corpus created from publicly available fund prospectuses in Germany.
Sentences extracted using Apache's PDFBox text stripper routine.
Annotated by three experts with named entities and relations.
4. Experiments:
NER methods include CRF and BERT models.
RoBERTa model used for Relation Extraction (RE).
Evaluation based on precision, recall, and F1-score.
5. Conclusion:
CO-Fun dataset contains 948 sentences with named entity annotations.
Promising performance of NER and RE models on the dataset.
Stats
The CO-Fun dataset consists of 948 sentences with 5,969 named entity annotations, including 2,340 Outsourced Services, 2,024 Companies, 1,594 Locations, and 11 Software annotations.