Bridging Text Spotting introduces a novel approach that resolves the error accumulation and suboptimal performance issues in two-step text spotting methods while retaining modularity.
TextFormer, a novel query-based end-to-end text spotter, utilizes a multi-task model design and mixed supervision training to achieve state-of-the-art performance on scene text detection and recognition tasks.
An ensemble learning framework that combines multiple state-of-the-art scene text detection and recognition models significantly improves the performance of Vietnamese scene text spotting in complex urban settings.