핵심 개념
The core message of this paper is to propose a comprehensive tree construction based approach, named Detect-Order-Construct, for hierarchical document structure analysis. This approach decomposes the task into three stages: detecting page objects and assigning logical roles, predicting the reading order of the detected objects, and constructing the intended hierarchical structure, including the table of contents.
초록
The paper presents a tree construction based approach, Detect-Order-Construct, for hierarchical document structure analysis. The approach consists of three main stages:
Detect Stage:
- Identifies individual page objects within the document rendering and assigns a logical role to each detected page object.
- Employs a hybrid method that combines a top-down graphical page object detection model and a bottom-up text region detection model.
- The bottom-up text region detection model uses a multi-modal feature extraction and enhancement module, an intra-region reading order relation prediction head, and a logical role classification head.
Order Stage:
- Determines the reading sequence of the detected page objects and text regions.
- Utilizes a multi-modal, transformer-based relation prediction model to predict the inter-region reading order relationships.
- Incorporates an additional inter-region reading order relation classification head to predict the relation types.
Construct Stage:
- Extracts the table of contents within the document to summarize the overall hierarchical structure.
- Employs a transformer-based model to predict the hierarchical relationships between section headings.
The proposed end-to-end system achieves state-of-the-art performance on several document layout analysis and hierarchical document structure reconstruction benchmarks.