Simple architectural and training modifications, such as retaining aspect ratio, using max-pooling, and adding a CTC shortcut, can significantly improve the performance of basic convolutional-recurrent handwritten text recognition systems.
Incorporating explicit n-gram language models significantly improves the performance of state-of-the-art deep learning architectures for handwritten text recognition, challenging the notion that deep learning models alone are sufficient for optimal performance.