Core Concepts
DeepSolo++ introduces a simple DETR-like baseline for multilingual text spotting, achieving better training efficiency and outperforming previous methods.
Abstract
The content discusses the development of DeepSolo++, a novel approach for multilingual text spotting. It addresses the limitations of existing Transformer-based methods by introducing a single decoder with explicit points for detection, recognition, and script identification simultaneously. The method shows superior extensibility, simplicity in structure and training pipeline, and efficient performance on various benchmarks. Extensive experiments demonstrate its state-of-the-art performance in comparison to previous models.
Abstract:
End-to-end text spotting integrates detection and recognition efficiently.
Transformer-based methods face challenges in synergy between sub-tasks.
DeepSolo++ simplifies the pipeline with a single decoder for multilingual tasks.
Introduction:
Challenges in handling relationship between detection and recognition.
Existing methods focus on specific languages without unified models.
Methodology:
Proposal of Bezier center curve representation for scene text.
Explicit point queries used for encoding text semantics and locations.
Results:
Achieves better training efficiency compared to Transformer-based models.
Outperforms state-of-the-art on ICDAR 2019 ReCTS benchmark.
Stats
DeepSolo++は、ICDAR 2019 ReCTSの1-NEDメトリックを78.3%に向上させました。
DeepSolo++は、ICDAR 2019 MLTでH-meanが5.5%向上し、エンドツーエンドスポッティングで2.7%のH-meanゲインを達成しました。
Quotes
"Extensive experiments demonstrate that our simple approach achieves better training efficiency compared with Transformer-based models."