Core Concepts
Neural network models can effectively predict the quality of questions on StackOverflow, outperforming traditional text mining classification models.
Abstract
The paper evaluates the use of neural network models to predict the quality of questions on the StackOverflow question-answering platform. The authors used a dataset of 60,000 StackOverflow questions classified into high-quality, low-quality edited, and low-quality closed categories.
Key highlights:
The authors performed text preprocessing, including removing stop words and converting the questions into a bag-of-words representation.
They developed two sequential neural network models, one with three dense layers and one with two dense layers, and compared their performance to baseline models like Naive Bayes, Support Vector Machine, and Decision Tree.
The neural network models achieved an accuracy of around 80%, outperforming the baseline models.
The authors found that the number of layers in the neural network model can significantly impact its performance, with the two-layer model slightly outperforming the three-layer model.
The results demonstrate the effectiveness of deep learning models in text classification tasks compared to traditional text mining approaches.
The authors also discuss the limitations of their study, including the potential for overfitting in the neural network models, and suggest future work to address this issue.
Stats
The dataset contains 60,000 StackOverflow questions from 2016-2020 classified into three categories: high quality (HQ), low quality edited (LQ Edit), and low quality closed (LQ Close).
Quotes
"Results showed that the best model achieved an accuracy of 81.22% which outperformed exiting results according to the literature."
"Results showed that increasing the pre-training of bidirectional encoder representations from transformers model as well as finetuning the question and answers can help improve the performance quality prediction and achieve a prediction accuracy higher than 80%."
"Results from the experiments showed that the proposed multilayer convolutional neural network achieved an F1-score of 98% for the best case."