Core Concepts
AnnoCTR is a new publicly available dataset of 400 cybersecurity threat reports, 120 of which are annotated with named entities, temporal expressions, and cybersecurity-specific concepts including tactics and techniques from the MITRE ATT&CK taxonomy. The dataset enables research on advanced natural language processing techniques for managing and analyzing unstructured cybersecurity information.
Abstract
The AnnoCTR dataset consists of 400 cybersecurity threat reports obtained from commercial threat intelligence vendors. 120 of these reports have been annotated by a domain expert with a variety of named entities, including locations, organizations, industry sectors, and cybersecurity-specific concepts such as malware, hacker groups, and techniques and tactics from the MITRE ATT&CK taxonomy. The entities are linked to external knowledge bases like Wikipedia and MITRE ATT&CK.
The authors propose several NLP tasks based on the dataset, including named entity recognition, temporal expression extraction and normalization, and entity and concept disambiguation. They provide experimental results using state-of-the-art neural models for these tasks, demonstrating the challenges and opportunities in applying advanced text understanding techniques to the cybersecurity domain.
The authors find that while general-purpose named entity recognition models perform reasonably well, specialized models are required for accurately identifying and linking cybersecurity-specific concepts, especially for implicitly mentioned techniques and tactics. They show that data augmentation using the textual descriptions from the MITRE ATT&CK knowledge base can be an effective strategy in this few-shot learning scenario.
Overall, the AnnoCTR dataset and the proposed NLP tasks and models lay the foundation for developing more sophisticated natural language processing capabilities to support cybersecurity professionals in managing and analyzing large volumes of unstructured threat intelligence information.
Stats
The attack happened yesterday.
They usually use different types of url shorteners in their mailings.
VJWorm has also been seen recently with different techniques for exfiltration.
Quotes
"Adversaries may forge credential materials that can be used to gain access to web applications or Internet services."
"Adversaries may forge web cookies that can be used to gain access to web applications or Internet services."
"An adversary may forge SAML tokens with any permissions claims and lifetimes if they possess a valid SAML token-signing certificate."