Core Concepts
Native Japanese speakers exhibit a general consensus in their judgments of whether to omit or retain arguments in sentences, with the degree of agreement varying based on associated linguistic factors.
Abstract
This study investigates the degree of consensus among native Japanese speakers on argument ellipsis judgments and the underlying linguistic factors that influence these judgments. The authors collected a large-scale dataset of over 2,000 annotations from native speakers on whether and why particular arguments should be omitted in Japanese sentences.
The key findings are:
Native speakers overall share common criteria for ellipsis judgments, with a high inter-annotator agreement (Krippendorff's alpha of 0.87). The degree of agreement varies depending on the associated linguistic factors.
The annotations exhibit a clear parallel with the ellipsis judgments in the original corpus, suggesting a shared consensus between the author (of the corpus text) and the readers.
Experiments with language models, including BERT and GPT-4, reveal a gap between the systems' predictions and human judgments, particularly for judgments based on "soft" preferences rather than "hard" constraints.
The authors hope this fundamental resource will encourage further studies on natural human ellipsis judgment and its computational modeling.
Stats
37% of arguments, such as subjects and objects, are omitted in the Japanese corpus.
The omission rate for nominative (NOM) arguments is around 50%.
Quotes
"Speakers sometimes omit certain arguments of a predicate in a sentence; such omission is especially frequent in pro-drop languages."
"The identification of human consensus on argument ellipsis judgments and its factors will contribute to linguistics and education: it can clarify the underlying consensus on the judgments among native speakers and be a helpful resource for writing assistance."