toplogo
Sign In

Software Engineering Researchers Publish Hundreds of GitHub Repositories, but Many Struggle with Maintenance and Community Engagement


Core Concepts
Software engineering researchers frequently publish their research artifacts, including tools and datasets, on GitHub, but many struggle to maintain these repositories and engage with the broader developer community.
Abstract
The study examines the use of GitHub by software engineering researchers, analyzing over 10,000 publications from top venues and 3,449 associated GitHub links. Key findings include: Only about one-third of authors choose to release artifacts specifically to GitHub and reference it in their paper, suggesting a deliberate choice to cite it as a contribution. Authors use GitHub repositories for a mix of purposes, including releasing general purpose tools, replication packages, and datasets. GitHub has become a common subject in SE research repositories starting in the mid 2010s, largely tracking its popularity among software developers. Virtually all publication-tied repositories are still available, but the first handful of repository disappearances are a harbinger of the consequences of exclusively using a non-archival platform. GitHub adoption has been fairly universal across publication venues and has largely increased as a share of papers in recent years. Popularity varies tremendously among repositories, reminiscent of "rich-get-richer" effects on other social platforms. This outcome is surprisingly often related with publication venue and repository purpose. Author responsiveness to issues is generally low and slow: less than half of issues receive a response from the owner, and those often take more than three weeks. While cause and causality are impossible to establish, statistics from recent years support a correlation between GitHub inclusion and citation count, which is especially strong when the repository is also popular. Popular repositories have the potential to greatly facilitate (and even inspire) subsequent research, which feeds back into the paper's impact. The findings suggest the need to rethink the research incentives and reward structure around research products requiring sustained contributions on platforms like GitHub.
Stats
"We find a wide distribution in popularity and impact, some strongly correlated with publication venue." "These were often heavily informed by the authors' investment in terms of timely responsiveness and upkeep, which was often remarkably subpar by GitHub's standards, if not absent altogether." "Popular repositories often go hand-in-hand with well-cited papers and achieve broad impact."
Quotes
"Virtually all of these repositories are still present: only 4 out of the 309 links we collected were unavailable." "GitHub has become a common subject in SE research repositories starting in the mid 2010s, largely tracking its popularity among software developers." "Popularity varies tremendously among repositories, reminiscent of "rich-get-richer" effects on other social platforms." "Author responsiveness to issues is generally low and slow: less than half of issues receive a response from the owner, and those often take more than three weeks." "While cause and causality are impossible to establish, statistics from recent years support a correlation between GitHub inclusion and citation count, which is especially strong when the repository is also popular."

Deeper Inquiries

Incentivizing Maintenance of GitHub Repositories

To encourage more sustained maintenance and engagement with publication-associated GitHub repositories, the research community could implement several incentives and policies: Recognition and Reward System: Establishing a formal recognition system that acknowledges researchers who actively maintain and engage with their repositories. This recognition could be in the form of awards, mentions in academic publications, or inclusion in tenure and promotion evaluations. Funding Opportunities: Providing funding opportunities specifically for maintaining and updating research repositories. Research grants could include a component dedicated to repository upkeep, ensuring that researchers have the resources to sustain their projects. Collaborative Platforms: Creating collaborative platforms where researchers can work together on maintaining repositories. This could involve setting up forums, workshops, or hackathons focused on repository maintenance. Training and Support: Offering training sessions and resources on best practices for maintaining GitHub repositories. This could include workshops on documentation, issue management, and responding to community feedback. Community Engagement: Encouraging community engagement with repositories by promoting them in academic circles, sharing success stories of impactful repositories, and fostering a culture of collaboration and feedback.

Evolution of Research Evaluation Process

To better account for the value and impact of research artifacts beyond just the associated publication, the research evaluation process may need to evolve in the following ways: Artifact Evaluation Tracks: Introducing artifact evaluation tracks in conferences and journals where researchers can submit their repositories for review. This would ensure that the quality and impact of research artifacts are assessed alongside the paper. Citation Metrics for Repositories: Developing citation metrics specifically for GitHub repositories to measure their impact and influence in the research community. This could be integrated into existing citation indices to provide a comprehensive view of a researcher's contributions. Peer Review of Repositories: Incorporating peer review of repositories into the publication process, where experts in the field evaluate the quality, usability, and significance of the research artifacts. This would add credibility to the repositories and encourage researchers to maintain them. Altmetrics for Repositories: Utilizing altmetrics to track the online attention and engagement with research repositories. This could include mentions on social media, downloads, and interactions on GitHub to capture the broader impact of the artifacts. Long-Term Impact Assessment: Implementing mechanisms to track the long-term impact of research artifacts, including how they are reused, extended, and cited over time. This would provide a more comprehensive understanding of the value of repositories beyond immediate citations.

Lessons from Successful Open-Source Projects

The software engineering research community can learn valuable lessons from successful open-source projects on GitHub to foster vibrant communities around research artifacts: Transparency and Collaboration: Emphasize transparency in the development process, encourage collaboration among researchers, and welcome contributions from the community to enhance the quality and impact of research artifacts. Documentation and Accessibility: Prioritize thorough documentation of research repositories, making them accessible and user-friendly for a wide audience. Clear instructions, examples, and explanations can attract more users and contributors. Community Engagement: Actively engage with the community through discussions, issue tracking, and feedback mechanisms. Respond promptly to queries, address concerns, and foster a supportive and interactive environment around the research artifacts. Continuous Improvement: Strive for continuous improvement and evolution of research artifacts based on community feedback and emerging trends. Regular updates, bug fixes, and feature enhancements can keep the repositories relevant and valuable. Promotion and Recognition: Promote the research artifacts through various channels, highlight their impact and significance, and recognize the contributions of researchers who maintain and enhance the repositories. This can attract more users and contributors, leading to a vibrant and sustainable research community.
0