핵심 개념
The software engineering community has seen a significant increase in the prevalence of open-source research artifacts, but the current status and trends of these artifacts remain unclear, warranting further investigation to improve their quality and maintenance.
초록
This study presents a comprehensive empirical analysis of research artifacts in software engineering publications from 2017 to 2022, focusing on four key aspects:
Common Practices:
The majority (64.2% in 2022) of researchers choose to upload their artifacts on GitHub, despite recommendations to use dedicated platforms like Zenodo.
Python has overtaken Java as the most widely used programming language in artifacts, accounting for 61.1% in 2022.
About half (52.3%) of the publications with artifacts place the URL in the abstract or introduction, making them more discoverable.
Maintenance:
The proportion of link rot (unavailable URLs) increases over time, from 4.8% in 2022 to 29.8% in 2017.
Link rot is more prevalent in temporary drives (32.6%) and personal homepages (11.8%) compared to GitHub (6.4%) and dedicated artifact platforms (7.1%).
Over 90% of artifacts are updated after the submission deadline, but the update ratio drops significantly after subsequent milestones, indicating a lack of long-term maintenance.
Popularity:
Most GitHub artifacts receive limited attention, with 65.0% attracting no more than 10 stars, suggesting a lack of real-world impact.
The 33 top-starred artifacts (>100 stars) are well-documented, maintained, and have been integrated into large-scale industrial projects, demonstrating their significant influence.
Quality:
Over 96% of artifacts trigger code smell alerts mainly for code convention rather than functional issues, indicating that existing code smell detection approaches may be insufficient for accurately assessing artifact quality.
The documentation quality of artifacts is generally high, with most top-starred artifacts providing comprehensive information on usage, examples, and licenses.
The findings provide valuable insights for different stakeholders to enhance the practices, maintenance, popularity, and quality of research artifacts in the software engineering community.
통계
65.9% of publications from 2017 to 2022 provided research artifacts.
64.2% of artifacts in 2022 were stored on GitHub, while 16.0% were on Zenodo.
61.1% of artifacts in 2022 were primarily written in Python, surpassing Java at 15.7%.
4.8% of artifact URLs were invalid in 2022, up from 29.8% in 2017.
65.0% of GitHub artifacts had 10 or fewer stars, indicating limited popularity.
Over 96% of Python and Java artifacts triggered code smell alerts for code convention issues.
인용구
"The majority (64.2% in 2022) of publications still upload their artifacts on GitHub, even though services for version control systems are not recommended in some conferences."
"Python has overtaken Java and become the most widely used language in SE artifacts and is getting more and more adoption, with its ratio increasing from 15.2% in 2017 to 61.1% in 2022."
"Link rot is more prevalent in temporary drives (32.6%) and personal homepages (11.8%) than on GitHub (6.4%) and artifact service platforms (7.1%)."
"Over 96% of artifacts trigger code smell alerts mainly for code convention rather than functional issues, indicating that code smell detection seems insufficient to accurately assess code quality for artifacts."