Hierarchical Document Structure Analysis: A Tree Construction Approach for Detecting, Ordering, and Reconstructing Document Layouts
The core message of this paper is to propose a comprehensive tree construction based approach, named Detect-Order-Construct, for hierarchical document structure analysis. This approach decomposes the task into three stages: detecting page objects and assigning logical roles, predicting the reading order of the detected objects, and constructing the intended hierarchical structure, including the table of contents.