Khái niệm cốt lõi
The core message of this paper is to propose a comprehensive tree construction based approach, named Detect-Order-Construct, for hierarchical document structure analysis. This approach decomposes the task into three stages: detecting page objects and assigning logical roles, predicting the reading order of the detected objects, and constructing the intended hierarchical structure, including the table of contents.
Tóm tắt
The paper presents a tree construction based approach, Detect-Order-Construct, for hierarchical document structure analysis. The approach consists of three main stages:
Detect Stage:
- Identifies individual page objects within the document rendering and assigns a logical role to each detected page object.
- Employs a hybrid method that combines a top-down graphical page object detection model and a bottom-up text region detection model.
- The bottom-up text region detection model uses a multi-modal feature extraction and enhancement module, an intra-region reading order relation prediction head, and a logical role classification head.
Order Stage:
- Determines the reading sequence of the detected page objects and text regions.
- Utilizes a multi-modal, transformer-based relation prediction model to predict the inter-region reading order relationships.
- Incorporates an additional inter-region reading order relation classification head to predict the relation types.
Construct Stage:
- Extracts the table of contents within the document to summarize the overall hierarchical structure.
- Employs a transformer-based model to predict the hierarchical relationships between section headings.
The proposed end-to-end system achieves state-of-the-art performance on several document layout analysis and hierarchical document structure reconstruction benchmarks.