A Query-based End-to-End Text Spotter with Mixed Supervision for Improved Scene Text Detection and Recognition
TextFormer, a novel query-based end-to-end text spotter, utilizes a multi-task model design and mixed supervision training to achieve state-of-the-art performance on scene text detection and recognition tasks.