Core Concepts
Introducing a decoding strategy based on domain-conditional pointwise mutual information (PMIDC) to reduce hallucination in abstractive summarization by considering the source text's domain.
Abstract
The paper proposes a decoding strategy called PMIDC (domain-conditional pointwise mutual information) to mitigate hallucination in abstractive summarization. Hallucination refers to the phenomenon where a model generates plausible but factually inconsistent text that is absent in the source text.
The key insights are:
The domain (or topic) of the source text triggers the model to generate text that is highly probable in the domain, leading to hallucination.
PMIDC computes how much more likely a token becomes in the summary when conditioned on the input source text, compared to when the token is conditioned only on the domain of the source text. This effectively penalizes the model's tendency to fall back to domain-associated words when the model has high uncertainty about the generated token.
PMIDC is an extension of Conditional Pointwise Mutual Information (CPMI), which does not capture the importance of the source domain in summarization.
The authors use domain prompts, such as keywords, the first sentence, or a randomly selected sentence from the source text, to condition the generation probability of a token on the source domain.
Experiments on the XSUM dataset show that PMIDC achieves significant improvements in faithfulness and relevance to source texts compared to baselines, with only a marginal decrease in ROUGE and BERTScore.
Stats
"Our latest economic data shows that many Scottish businesses will have a successful 2017..."
"The Scottish Chambers of Commerce has issued a warning about the outlook for the economy in 2017."
"The Scottish Chambers of Commerce has said it expects the economy to have a 'successful' year in 2017."
Quotes
"Our latest economic data shows that many Scottish businesses will have a successful 2017..."
"The Scottish Chambers of Commerce has said it expects the economy to have a 'successful' year in 2017."