The paper systematically evaluates the ability of large language models (LLMs) to understand puns, a form of linguistic humor that exploits the double or multiple meanings of words. The authors focus on three key tasks: pun recognition, pun explanation, and pun generation.
For pun recognition, the authors design biased prompts to assess the LLMs' confidence and accuracy in distinguishing between puns and non-puns. They find that most LLMs are easily influenced by prompt bias, and some struggle to maintain consistency in their responses.
In the pun explanation task, the authors employ both fine-grained punchline checks and coarse-grained pairwise comparisons to assess the LLMs' ability to identify the pun pair (the pun word and its alternative meaning) and explain the humor. The results show that while LLMs can accurately identify the pun words, they often struggle to recognize the alternative meanings, especially in heterographic puns (where the pun word and its alternative have similar pronunciations but different spellings).
For pun generation, the authors explore two settings: generating puns with only the pun pair provided, and generating puns with both the pun pair and relevant contextual words. They find that some powerful LLMs, such as GPT-4-Turbo and Claude-3-Opus, can generate puns that surpass the quality of human-written puns. However, the authors also identify a "lazy pun generation" pattern, where LLMs tend to include multiple pun words in their generated puns, a behavior rarely seen in human-written puns.
The authors also introduce several novel evaluation methods and metrics, such as dual-biased prompted asking, punchline check, and an overlap indicator for assessing the originality of generated puns. These new approaches better adapt to the in-context learning paradigm of LLMs and align more closely with human cognitive processes.
Overall, the paper provides a comprehensive and in-depth analysis of LLMs' capabilities in understanding puns, highlighting their strengths and weaknesses. The findings offer valuable insights for future research in this area, particularly in enhancing LLMs' ability to comprehend and generate linguistic humor.
To Another Language
from source content
arxiv.org
Principais Insights Extraídos De
by Zhijun Xu,Si... às arxiv.org 04-23-2024
https://arxiv.org/pdf/2404.13599.pdfPerguntas Mais Profundas