Core Concepts
Deep learning frameworks exhibit distinct long-term trends in code clone evolution, with varying characteristics in terms of cloned code size, bug-proneness, and community involvement. Short-term within-release code cloning practices also impact the long-term clone trends. Cross-framework code clones reveal functional and architectural adaptations across deep learning frameworks.
Abstract
The study investigates the evolution and impact of code clones in nine popular deep learning (DL) frameworks: TensorFlow, Paddle, PyTorch, Aesara, Ray, MXNet, Keras, Jax, and BentoML. The key findings are:
Long-term Trends:
Four distinct long-term code clone trends are identified: "Serpentine", "Rise and Fall", "Decreasing", and "Stable".
The "Decreasing" and "Rise and Fall" trends exhibit a reduction in cloned code size over time, attributed to code refactoring, use of third-party libraries, and removal of clones due to feature elimination.
The "Serpentine" trend is more susceptible to bugs, with over 50% of bug-fixing commits occurring in "thick" clones across all trends.
Bug-fixing is a persistent activity throughout the framework lifespans, but is more prevalent in the "Serpentine" trend.
Within-release Patterns:
Three within-release code cloning patterns are observed: "Ascending", "Descending", and "Steady", which impact the long-term clone trends.
The "Ascending" pattern is associated with decreased committer involvement, suggesting fewer committers may lead to increased cloned code size.
Cross-framework Clones:
Cross-framework file-level code clones exist, falling into two categories: functional and architectural adaptation clones.
Cross-framework clones gradually disappear over time due to functionality evolution, code divergence, function deprecation, and framework restructuring.
The findings provide insights to enhance the efficiency and maintainability of DL frameworks, foster collaborative efforts, and mitigate the risks associated with code clones.
Stats
The median cloned code size in BentoML decreased from 982 to 174 lines between the first and last releases.
The clone coverage in Aesara, Keras, and PyTorch decreased from 7% to 5%, 9% to 4%, and 26% to 2% respectively between the first and last releases.
Over 50% of bug-fixing commits occur in "thick" clones across all long-term code clone trends.
Quotes
"The decline in cloned code size can be attributed to code refactoring, third-party library reuse, and code clone removal associated with feature elimination."
"The 'Serpentine' trend is more susceptible to bugs, with over 50% of the releases having more than 50% bug-fixing commits in clones."
"Bug-fixing is a persistent activity consistently occurring throughout the lifespan of frameworks, among all the code cloning trends."