Research Artifacts in Software Engineering Publications: Current Status and Trends
Core Concepts
The software engineering community has seen a significant increase in the prevalence of open-source research artifacts, but the current status and trends of these artifacts remain unclear, warranting further investigation to improve their quality and maintenance.
Abstract
This study presents a comprehensive empirical analysis of research artifacts in software engineering publications from 2017 to 2022, focusing on four key aspects:
Common Practices:
- The majority (64.2% in 2022) of researchers choose to upload their artifacts on GitHub, despite recommendations to use dedicated platforms like Zenodo.
- Python has overtaken Java as the most widely used programming language in artifacts, accounting for 61.1% in 2022.
- About half (52.3%) of the publications with artifacts place the URL in the abstract or introduction, making them more discoverable.
Maintenance:
- The proportion of link rot (unavailable URLs) increases over time, from 4.8% in 2022 to 29.8% in 2017.
- Link rot is more prevalent in temporary drives (32.6%) and personal homepages (11.8%) compared to GitHub (6.4%) and dedicated artifact platforms (7.1%).
- Over 90% of artifacts are updated after the submission deadline, but the update ratio drops significantly after subsequent milestones, indicating a lack of long-term maintenance.
Popularity:
- Most GitHub artifacts receive limited attention, with 65.0% attracting no more than 10 stars, suggesting a lack of real-world impact.
- The 33 top-starred artifacts (>100 stars) are well-documented, maintained, and have been integrated into large-scale industrial projects, demonstrating their significant influence.
Quality:
- Over 96% of artifacts trigger code smell alerts mainly for code convention rather than functional issues, indicating that existing code smell detection approaches may be insufficient for accurately assessing artifact quality.
- The documentation quality of artifacts is generally high, with most top-starred artifacts providing comprehensive information on usage, examples, and licenses.
The findings provide valuable insights for different stakeholders to enhance the practices, maintenance, popularity, and quality of research artifacts in the software engineering community.
Translate Source
To Another Language
Generate MindMap
from source content
Research Artifacts in Software Engineering Publications
Stats
65.9% of publications from 2017 to 2022 provided research artifacts.
64.2% of artifacts in 2022 were stored on GitHub, while 16.0% were on Zenodo.
61.1% of artifacts in 2022 were primarily written in Python, surpassing Java at 15.7%.
4.8% of artifact URLs were invalid in 2022, up from 29.8% in 2017.
65.0% of GitHub artifacts had 10 or fewer stars, indicating limited popularity.
Over 96% of Python and Java artifacts triggered code smell alerts for code convention issues.
Quotes
"The majority (64.2% in 2022) of publications still upload their artifacts on GitHub, even though services for version control systems are not recommended in some conferences."
"Python has overtaken Java and become the most widely used language in SE artifacts and is getting more and more adoption, with its ratio increasing from 15.2% in 2017 to 61.1% in 2022."
"Link rot is more prevalent in temporary drives (32.6%) and personal homepages (11.8%) than on GitHub (6.4%) and artifact service platforms (7.1%)."
"Over 96% of artifacts trigger code smell alerts mainly for code convention rather than functional issues, indicating that code smell detection seems insufficient to accurately assess code quality for artifacts."
Deeper Inquiries
How can the software engineering community incentivize researchers to maintain their artifacts beyond the initial publication?
To incentivize researchers to maintain their artifacts beyond the initial publication, the software engineering community can implement several strategies:
Recognition and Visibility: Acknowledge and highlight researchers who maintain their artifacts by featuring them in conferences, journals, or on community platforms. This recognition can motivate researchers to continue updating and improving their artifacts.
Citation and Impact: Emphasize the importance of artifacts in citations and impact assessments. Researchers who maintain their artifacts and keep them up-to-date could receive higher recognition and impact scores, encouraging them to prioritize artifact maintenance.
Community Support: Create a supportive community where researchers can collaborate, share best practices, and seek help when facing challenges in maintaining their artifacts. Peer support and feedback can be valuable in sustaining artifact maintenance efforts.
Funding and Grants: Provide funding opportunities or grants specifically for artifact maintenance. Researchers could apply for financial support to ensure the longevity and quality of their artifacts.
Workshops and Training: Organize workshops and training sessions on artifact maintenance best practices, version control, and documentation. By enhancing researchers' skills in artifact management, they are more likely to invest time and effort in maintaining their artifacts.
What alternative metrics, beyond star counts, could be used to better assess the real-world impact and popularity of research artifacts?
In addition to star counts, alternative metrics that could be used to assess the real-world impact and popularity of research artifacts include:
Forks: The number of times an artifact has been forked can indicate its popularity and the level of community engagement. More forks suggest that the artifact is being used, modified, and potentially improved by others.
Issues: Tracking the number of issues raised against an artifact can provide insights into its usability, functionality, and potential areas for improvement. A higher number of issues may indicate active usage and community involvement.
Downloads: Monitoring the number of downloads or clones of an artifact can give an indication of its reach and adoption within the community. Higher download numbers suggest a wider audience and potential impact.
Contributors: The number of contributors to an artifact can demonstrate the level of collaboration and engagement around the artifact. More contributors often signify a vibrant and active community around the artifact.
Citations: Tracking how many times an artifact has been cited in other research papers or projects can indicate its influence and relevance in the academic and industry domains. Citations reflect the impact and recognition of the artifact in the broader research community.
How can code quality assessment approaches be tailored specifically for research artifacts to provide more accurate and actionable insights?
To tailor code quality assessment approaches for research artifacts and provide more accurate and actionable insights, the following strategies can be implemented:
Customized Metrics: Develop specific code quality metrics tailored to the characteristics and requirements of research artifacts. These metrics should focus on factors relevant to reproducibility, readability, maintainability, and extensibility of the artifacts.
Domain-Specific Analysis: Conduct domain-specific analysis to identify common code smells, patterns, or anti-patterns that are prevalent in research artifacts. Understanding the unique challenges and requirements of the research domain can help in designing targeted quality assessments.
Automation and Tooling: Implement automated code analysis tools that are customized for research artifacts. These tools can detect specific issues, enforce coding standards, and provide actionable feedback to researchers on improving the quality of their artifacts.
Peer Review and Collaboration: Encourage peer review and collaboration among researchers to assess the code quality of artifacts. Peer feedback and collaboration can offer diverse perspectives and insights, leading to enhanced code quality and better research outcomes.
Continuous Improvement: Promote a culture of continuous improvement in code quality by providing resources, training, and guidelines for researchers to enhance their coding practices. Regular code reviews, refactoring sessions, and quality assurance processes can help maintain high standards in research artifacts.