Konsep Inti
A crowdsourced dataset of Quranic recitations from non-Arabic speakers was created and annotated to support the development of AI models for learning Quranic recitation.
Abstrak
The researchers explored the feasibility of crowdsourcing a carefully annotated Quranic audio dataset that can be used to build AI models to simplify the learning process for non-Arabic speakers. They used a volunteer-based crowdsourcing approach to collect audio recitations from 1287 participants across 11 non-Arabic countries, resulting in around 7000 Quranic recitations.
To annotate the collected audio data, the researchers developed a crowdsourcing platform called "Quran Voice". Proficient reciters were asked to listen to the audio recordings and provide labels on the correctness of the recitation, including categories like "Correct", "Has Mistakes", "Incomplete Verse", "Different Verse", "Multiple Verses", and "Empty/Not Related".
The annotation process achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and expert judgments. The researchers discuss the challenges faced in getting proficient reciters to participate in the annotation task and propose solutions to improve the process in the future.
Statistik
Around 7000 Quranic recitations were collected from 1287 participants across 11 non-Arabic countries.
1166 recitations were annotated with 6 categories by 71 participants.
The crowd accuracy was 0.77, the inter-rater agreement was 0.63, and the agreement between the algorithm and expert labels was 0.89.
Kutipan
"A dataset of Quranic recitation audios can be crowdsourced from beginner learners via a recitation app."
"The collected dataset can be labeled through a dedicated crowdsourcing tool."