CO-Fun: German Dataset on Company Outsourcing in Fund Prospectuses

Core Concepts
Introducing CO-Fun, a German dataset for NER and RE tasks in financial domain cyber mapping.
1. Abstract: Dataset for named entity recognition and relation extraction tasks. Annotated by three experts on 948 sentences. Contains annotations for four entity types and relations. 2. Introduction: Cyber incidents pose risks to financial stability. Cyber mapping links financial entities with service providers. Fund prospectuses provide insights into outsourced services. 3. Corpus Creation: Created from 1,054 fund prospectuses in Germany. Plain texts extracted using PDFBox text stripper routine. Sentences annotated by three experts with named entities and relations. 4. Experiments: Named Entity Recognition (NER): Used CRF and BERT models for NER methods. CRF model showed better performance than BERT on the test set. Relation Extraction (RE): Utilized RoBERTa model for RE method. Achieved an F1-score of 86.3% on the test set. 5. Conclusion: Introduced CO-Fun dataset for NER and RE tasks in company outsourcing. Contains annotations for entities and relations, publicly available under MIT license.
The corpus consists of 948 sentences with 5,969 named entity annotations and 4,102 relation annotations.

Deeper Inquiries

How can the CO-Fun dataset be expanded to include more diverse data sources

To expand the CO-Fun dataset and include more diverse data sources, several strategies can be implemented. Firstly, incorporating data from a wider range of financial documents such as annual reports, shareholder communications, or regulatory filings can provide a broader perspective on company outsourcing practices. Additionally, integrating information from news articles, social media platforms, and industry-specific websites can offer real-time insights into current trends and developments in company outsourcing within the financial sector. Collaborating with other institutions or organizations to access their datasets related to financial entities and service providers can also enrich the dataset with varied sources of information. Furthermore, including multilingual data sources can enhance the dataset's applicability across different regions and languages.

What are the potential ethical implications of using datasets like CO-Fun in real-world applications

The utilization of datasets like CO-Fun in real-world applications raises important ethical considerations that need to be addressed. One potential ethical implication is privacy concerns related to sensitive information about companies' outsourcing practices being exposed without proper consent or anonymization. Ensuring data security and confidentiality through robust encryption methods is crucial to protect the identities of individuals and organizations mentioned in the dataset. Moreover, maintaining transparency about data collection methods, usage policies, and potential biases in annotation processes is essential for building trust with stakeholders who interact with the dataset. It is imperative to adhere to legal regulations such as GDPR (General Data Protection Regulation) guidelines when handling personal or confidential data within datasets like CO-Fun.

How can the findings from this research contribute to improving cybersecurity measures in the financial sector

The findings from this research on company outsourcing practices within fund prospectuses using NLP models have significant implications for improving cybersecurity measures in the financial sector. By accurately identifying named entities related to outsourced services and companies through Named Entity Recognition (NER) tasks using datasets like CO-Fun, financial institutions can enhance their risk assessment capabilities regarding cyber threats associated with third-party service providers. Relation Extraction (RE) models trained on datasets like CO-Fun enable organizations to establish connections between entities involved in outsourcing relationships which are vital for detecting vulnerabilities or dependencies that could pose cybersecurity risks. These insights derived from NER and RE tasks contribute towards strengthening cybersecurity protocols by enabling proactive monitoring of outsourced services' performance metrics against predefined benchmarks for early detection of anomalies or suspicious activities that may indicate potential cyber threats targeting financial systems.