핵심 개념
We develop novel annotation guidelines for sentence-level subjectivity detection that can be applied across languages, and use them to create a high-quality corpus of English news articles annotated for subjective and objective sentences.
초록
The authors present a novel set of annotation guidelines for sentence-level subjectivity detection that can be applied to any language. They use these guidelines to create a corpus of 1,049 sentences from 23 English news articles, with 638 sentences labeled as objective and 411 as subjective.
The key highlights of the work are:
- The authors define subjectivity detection as an information-retrieval task, aiming to distinguish sentences from which information can be directly extracted (objective) and sentences that require further processing (subjective).
- The annotation guidelines were developed following a prescriptive paradigm, with annotators discussing and resolving controversial cases to produce detailed guidelines that can be applied across languages.
- The authors evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings, finding that models trained in the multilingual setting achieve the best performance.
- The authors demonstrate that their annotation guidelines can be transferred to other languages by re-annotating an existing Italian corpus and performing cross-lingual experiments.
- The resulting corpus, NewsSD-ENG, is a high-quality dataset for sentence-level subjectivity detection in English news articles, which the authors hope will foster research on subjectivity as a feature for tasks like opinion detection and fact-checking.
통계
"Subjectivity detection contributes to several Natural Language Processing applications, such as sentiment analysis, bias detection, and fact-checking."
"We manually curate NewsSD-ENG, a novel high-quality English corpus for SD concerning controversial topics from political affairs in news articles. The corpus contains 1,049 sentences extracted from 23 news articles, out of which 638 end up being objective and 411 subjective."
"The employed models achieve on-par or superior classification performance when trained in multilingual settings compared to monolingual ones."
인용구
"We frame SD into an information-retrieval process, with the purpose of discriminating between sentences from which information can be directly extracted (objective) and sentences that must be further processed (subjective)."
"We hope that our corpus can foster the research on subjectivity as a feature for tasks like opinion detection and fact-checking."
"These results suggest that our annotation guidelines can be transferred to other languages."