Core Concepts
A novel detector-purificator-corrector framework, DPCSpell, based on denoising transformers, for effectively correcting spelling errors in Bangla and other resource-scarce Indic languages.
Abstract
The paper proposes a novel transformer-based framework called DPCSpell for spelling error correction in Bangla and other resource-scarce Indic languages like Hindi and Telugu.
The key highlights are:
DPCSpell consists of three main components - a detector module, a purificator module, and a corrector module. The detector identifies the positions of erroneous characters, the purificator further refines the detected errors, and the corrector generates the final corrections.
Unlike previous methods that correct all characters in a word regardless of their correctness, DPCSpell selectively corrects only the erroneous portions, leading to improved performance.
The authors also introduce a method for creating a large-scale parallel corpus for Bangla spelling error correction, overcoming the resource scarcity issue for this language. This corpus is made publicly available.
Extensive experiments show that DPCSpell outperforms previous state-of-the-art methods for Bangla spelling error correction, achieving an exact match score of 94.78%.
The authors also provide a comprehensive comparison of rule-based, RNN-based, convolution-based, and transformer-based methods for the spelling error correction task.
Overall, the paper presents a novel and effective transformer-based framework for spelling error correction in Bangla and other resource-scarce Indic languages, along with a method for creating a large-scale corpus to address the data scarcity problem.
Stats
"Exact Match (EM) score of 94.78%"
"Precision score of 0.9487"
"Recall score of 0.9478"
"F1 score of 0.948"
"F0.5 score of 0.9483"
"Modified Accuracy (MA) score of 95.16%"
Quotes
"Unlike previous methods that correct all characters in a word regardless of their correctness, DPCSpell selectively corrects only the erroneous portions, leading to improved performance."
"The authors also introduce a method for creating a large-scale parallel corpus for Bangla spelling error correction, overcoming the resource scarcity issue for this language."