The paper investigates the causes for the poor performance of classifiers trained on explicit discourse relation examples when applied to real implicit scenarios. The key findings are:
Manual and empirical analyses show that removing connectives from explicit examples can lead to a change in the discourse relations expressed, a phenomenon called "label shift". This is because connectives play an important role in signaling the discourse relations.
The authors devise a metric to quantify the degree of label shift in each explicit example. They find that around 33% of explicit examples in PDTB 2.0 and 29.6% in PDTB 3.0 have a substantial label shift.
The authors analyze four factors that contribute to the label shift: the syntactic role of the connective, the ambiguity of the connective, the status of the arguments (intra- or inter-sentential), and the length of the input. They find that the syntactic role of the connective is the most influential factor.
To mitigate the impact of label shift, the authors propose two strategies: (1) filtering out explicit examples with high label shift, and (2) joint learning to recover the discarded connective during training. Experiments on PDTB 2.0, PDTB 3.0, and the GUM dataset show that these strategies can effectively improve the performance of explicit-to-implicit discourse relation recognition.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Wei Liu,Step... at arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.00999.pdfDeeper Inquiries