This study investigates the impact of integrating explicit n-gram language models with modern neural network architectures, including PyLaia and DAN, for handwritten text recognition. The authors explore different strategies for incorporating n-gram models, including the optimal parameters for language modeling such as tokenization level, n-gram order, weight, and smoothing.
The results show that incorporating character or subword n-gram models significantly improves the performance of automatic text recognition (ATR) models on three diverse datasets - IAM, RIMES, and NorHand v2. The combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.
The authors establish new state-of-the-art results on the NorHand v2 dataset and demonstrate that explicit language modeling can further enhance the performance of transformer-based models like DAN, which have shown impressive implicit language modeling capabilities. The study challenges the notion that deep learning models alone are sufficient for optimal handwritten text recognition performance and highlights the continued importance of explicit language modeling in this domain.
To Another Language
from source content
arxiv.org
Дополнительные вопросы