READOC: A Unified Benchmark for Realistic and Comprehensive Document Structured Extraction
READOC is a novel benchmark that frames document structured extraction as a realistic, end-to-end task of converting unstructured PDFs into semantically rich Markdown text, enabling a unified evaluation of state-of-the-art approaches.