Conceitos essenciais
Crowdsourcing platforms require systematic methods to evaluate data quality and detect spamming behaviors in order to improve analysis performance and reduce biases in subsequent machine learning tasks.
Resumo
The paper introduces a framework to assess the consistency and credibility of crowdsourced data. It proposes the following key elements:
-
Consistency Metric:
- Spammer Index: A variance ratio-based metric that uses generalized linear random effects models to capture the variance components from workers, tasks, and their interactions. A higher Spammer Index indicates lower consistency and potential data contamination.
-
Credibility Metrics:
- Spamming Behavior Classification: Identifies three typical spamming behaviors - Primary Choice, Repeated Pattern, and Random Guessing.
- Markov Chains and KL Divergence: Uses Markov chains to model response sequences and calculates the average Kullback-Leibler divergence (aKLD) between observed and target spamming behavior patterns to detect potential spammers.
- Deviance Distance: Employs deletion analysis based on a generalized linear random effects model to identify workers whose responses significantly impact the overall model fit.
The proposed methods are validated through simulation studies and applied to real-world crowdsourcing data collected from Amazon Mechanical Turk, Prolific, and in-person experiments with transportation security officers. The results demonstrate the effectiveness of the framework in assessing data quality and detecting spamming behaviors.
Estatísticas
The Spammer Index for the MTurk dataset is 0.166, for the Prolific dataset is 0.065, and for the airport dataset is 0.079.
In the MTurk dataset, 22 out of 29 detected spammers (75%) had accuracies lower than the mean, and 11 out of 29 (38%) had accuracies lower than 1 standard deviation below the mean.
In the Prolific dataset, 5 out of 10 detected spammers (50%) had accuracies lower than the mean, and 3 out of 10 (30%) had accuracies lower than 1 standard deviation below the mean.
In the airport dataset, 4 out of 13 detected spammers (31%) had accuracies lower than the mean, and 4 out of 13 (31%) had accuracies lower than 1 standard deviation below the mean.
Citações
"Crowdsourcing involves engaging web-based (or crowdsourcing) workers to voluntarily undertake a range of tasks, from simple surveys to complex digital experiments, leveraging collective human intelligence to test research hypotheses or to perform manual labeling."
"Data variability can lead to a decline in the performance of ML models trained on such data."
"Unlike the simple scenarios where Kappa coefficient and intraclass correlation coefficient usually can apply, online crowdsourcing requires dealing with more complex situations."