核心概念
A novel transformer-based system that combines retrieval and classification models to achieve a balanced trade-off between time efficiency and accuracy in detecting duplicate bug reports.
摘要
The paper proposes a transformer-based framework for the task of Duplicate Bug Report Detection (DBRD) that integrates retrieval and classification models to strike a balance between efficiency and accuracy.
Key highlights:
- Introduces a cluster-based dataset partition mechanism to address the issue of data leakage in previous studies.
- Conducts a comprehensive comparison of transformer-based models (sentence BERT, BERT, ALBERT, RoBERTa) against baseline retrieval (GloVe, FastText) and classification (Bi-LSTM, DC-CNN) models.
- The transformer-based retrieval model (sentence BERT) outperforms baseline retrieval models, while the transformer-based classification models (RoBERTa, BERT, ALBERT) outperform baseline classification models.
- The proposed hybrid system leverages the strengths of both retrieval and classification, achieving comparable accuracy to the classification model while significantly outperforming it in time efficiency, and only slightly behind the retrieval model in time.
- Evaluates the system's performance in two real-world scenarios (One vs All and All vs All) and demonstrates its ability to balance the trade-off between speed and accuracy.
統計資料
The average recall@100 of the sentence BERT retrieval model is 96.98%, outperforming GloVe (86.08%) and FastText (83.04%).
The average F1 score of the RoBERTa classification model is 86.66%, outperforming Bi-LSTM (48.24%) and DC-CNN (76.60%).
In the One vs All scenario, the proposed hybrid system achieves a recall of 92% and precision of 85% when k=100, compared to the classification model's recall of 92% and precision of 75%.
In the All vs All scenario, the proposed hybrid system takes 1.2 seconds per bug on average, compared to 60 seconds for the classification model.
引述
"Our system makes a trade-off by sacrificing some running time in order to maintain robust performance in terms of recall, precision, and accuracy."
"Selecting the appropriate value for k requires careful consideration. While a smaller k value may improve the time efficiency, it may also lead to a degradation in model performance. On the other hand, choosing a larger k value may result in increased time consumption."