Core Concepts
Malicious open-source software packages play a central role in software supply chain attacks, and this study provides a comprehensive analysis of a large-scale dataset of 23,425 malicious packages from various online sources to understand their diversity, attack campaigns, and evolution.
Abstract
The paper presents a comprehensive analysis of malicious open-source software (OSS) packages, which are a key component of software supply chain (SSC) attacks. The authors first built and curated the largest dataset of 23,425 malicious packages from scattered online sources, including open-source datasets, commercial websites, social networks, and individual blogs.
The key findings include:
- There is a lack of industry-wide and industry-academia collaboration, limiting the data quality of malicious packages, as there is little data overlap between different sources.
- Despite the large volume of malicious packages, their diversity evolution is slow and relatively stable, suggesting that today's defense tools can detect most malicious packages due to their use of known attack behaviors.
- Malicious OSS packages have a distinct life cycle of {changing→release→detection→removal}, where attackers repeatedly change the package (e.g., different name) to launch subsequent attack attempts.
- While malicious packages often lack context about how and who released them, security reports reveal missing information about corresponding SSC attack campaigns.
The authors also propose a knowledge graph, MALGRAPH, to represent the relationships between malicious packages, including duplicated, dependency, similar, and co-existing relationships. MALGRAPH is used to analyze the diversity, attack campaigns, and evolution of malicious packages.
Stats
The dataset contains 23,425 malicious packages, of which 9,141 are available and 14,589 are unavailable (only package names/versions are known).
Quotes
"Malicious packages initially deceive developers and users to download and install them and then execute subsequent behaviors, such as implanting backdoors [2], stealing sensitive information [3], and downloading and executing payloads without user permission (e.g., cryptominers [4])."
"Despite an increasing number of OSS malicious packages, their diversity evolution is slow and relatively stable. This suggests that today's defense tools work well because malicious packages use old and known attack behaviors."
"OSS malicious package has its distinct life cycle, denoted as {changing→release→detection→removal}. An SSC attack campaign is a repeating process in which attackers change malicious packages to launch several attack attempts."