Conceitos Básicos
Utilizing a unified semantic discourse structure improves headline generation by capturing core document semantics effectively.
Resumo
The article introduces a method using a unified semantic discourse structure (S3) to represent document semantics, combining RST trees and AMR graphs. The hierarchical composition of sentence, clause, and word characterizes the semantic meaning. A headline generation framework is developed using S3 graphs as contextual features, with a dynamic pruning mechanism to enhance efficacy. Experimental results show outperformance on headline generation datasets.
Introduction
Headline generation aims to summarize documents concisely.
Research has shifted focus to truthfulness and attractiveness in headlines.
Existing methods overlook intrinsic document characteristics.
Related Work
Automatic headline generation has received significant research attention.
Methods classified into extractive and abstractive paradigms.
Abstractive methods achieve state-of-the-art performance.
Discourse Structure Modeling
RST trees and AMR graphs are integrated into an S3 graph for document representation.
Hierarchical structure pruning mechanism enhances the efficacy of the discourse structure.
Headline Generation Framework
PLM encodes input documents for contextual representations.
GAT models the S3 graph features for structural modeling.
Dynamic structure pruning filters redundant nodes based on reinforcement learning.
Experimental Settings
Experiments conducted on CNNDM-DH and DM-DHC datasets.
Comparison with strong-performing baseline models shows superior performance across metrics.
Results and Discussion
Our model outperforms baselines consistently on headline generation tasks.
Human evaluation confirms high-quality generated headlines compared to baselines.
Further Analyses
Impact of document length shows our model's advantage in handling longer documents effectively.
Roles of different node types in the S3 graph highlight the importance of key information nodes after pruning.
Estatísticas
"Experimental results demonstrate that our method outperforms existing state-of-art methods consistently."
Citações
"Our work can be instructive for a broad range of document modeling tasks."
"Document texts consist of a considerable number of subordinate sentences or clauses, thus containing lengthy and mixed information."