The author proposes a novel open-vocabulary text recognition framework, Pseudo-OCR, to address out-of-vocabulary (OOV) words by generating pseudo training data from real-world images. The approach includes a quality-aware margin loss to enhance training with both real and pseudo data.
E2STR, a scene text recognition model trained with context-rich sequences, can rapidly adapt to diverse scenarios in a training-free manner by leveraging in-context prompts.
VL-Reader, a novel scene text recognition approach, leverages masked visual-linguistic reconstruction to learn strong cross-modal feature representation, enabling robust performance across a wide range of challenging scenarios.
This research paper introduces Stratified Domain Adaptation (StrDA), a novel self-training approach for improving scene text recognition models by progressively adapting them from labeled synthetic data to unlabeled real-world data.
This paper introduces IPAD, a novel scene text recognition (STR) model that uses an iterative, parallel, and diffusion-based decoding approach to achieve state-of-the-art accuracy and efficiency by balancing the strengths of autoregressive and non-autoregressive methods.