toplogo
Sign In

Annotated Dataset of Character Coreference in Korean Novels: KoCoNovel


Core Concepts
KoCoNovel is an annotated dataset for character coreference resolution in Korean literary texts, addressing the unique linguistic and cultural aspects of the Korean language.
Abstract
The KoCoNovel dataset is derived from 50 modern and contemporary Korean novels, comprising 178,957 tokens and 19,030 character mentions. It stands as the first Korean coreference resolution dataset based on literary texts, addressing the challenges posed by the nuances of Korean address terms and the lack of grammatical markers for proper nouns. The key highlights of the KoCoNovel dataset include: Revised annotation guidelines to accommodate the culture of address terms in Korea, where characters are frequently referred to by common nouns denoting social relationships and kinship rather than personal names. Four distinct versions of the dataset, offering annotations from the perspectives of the omniscient author and the readers, as well as options for treating multiple entities as either separate or overlapping. Detailed analysis revealing that 24% of all character mentions in KoCoNovel are single common nouns, highlighting the unique linguistic characteristics of the Korean language. Experiments with BERT-based coreference models demonstrating notable performance improvements on KoCoNovel compared to models trained solely on the non-literary NIKL corpus, underscoring the dataset's potential to enhance coreference resolution in Korean. The KoCoNovel dataset provides a comprehensive resource for exploring character coreference resolution in Korean literature, integrating the cultural and linguistic dynamics of the Korean language.
Stats
"Where did the eggs come from?" asked the grandfather. I told Sam-soon's mother that brother was coming, and she gave me two eggs to steam for him. Did Bobu's mother not come today either? Who is Bobu's mother?
Quotes
"Wise reader! Why hesitate? Surely this profound and critical issue awaits your awareness, wisdom, and strength?"

Key Insights Distilled From

by Kyuhee Kim,S... at arxiv.org 04-02-2024

https://arxiv.org/pdf/2404.01140.pdf
KoCoNovel

Deeper Inquiries

How might the dataset be expanded to include a wider range of literary genres beyond modern and contemporary novels?

To expand the dataset to include a wider range of literary genres, beyond modern and contemporary novels, several steps can be taken: Inclusion of Classical Literature: Incorporating works from classical Korean literature, such as traditional poetry, historical texts, and folk tales, would provide a broader representation of Korean literary traditions. Diversification of Genres: Adding genres like historical fiction, science fiction, fantasy, and mystery novels would offer a more comprehensive view of character coreference across different storytelling styles. Incorporating Plays and Screenplays: Including scripts from plays, movies, and television shows would introduce a new dimension to character coreference analysis, considering the unique characteristics of dialogue in these formats. Collaboration with Authors and Publishers: Partnering with authors and publishers to access a variety of literary works, including niche genres and emerging literary trends, could help in expanding the dataset. Crowdsourcing and Community Contributions: Engaging the literary community through crowdsourcing initiatives to annotate character coreference in a wide range of literary genres could facilitate dataset expansion.

How do the challenges and solutions presented in the KoCoNovel dataset compare to those encountered in coreference resolution for other languages and literary traditions?

The challenges and solutions presented in the KoCoNovel dataset for character coreference resolution in Korean literature can be compared to those encountered in other languages and literary traditions: Cultural Nuances: Like in Korean literature, other languages and literary traditions also have unique cultural nuances that impact character references. Understanding these nuances is crucial for accurate coreference resolution. Address Terms: The use of address terms in Korean literature, which reflects social relationships and kinship, is similar to the use of honorifics and titles in other languages. Resolving coreferences involving such terms requires cultural and linguistic knowledge. Genre-specific Challenges: Different literary genres present specific challenges for coreference resolution. For example, poetry may have more ambiguous references compared to prose, requiring specialized annotation guidelines. Historical Context: Coreference resolution in historical literature may involve archaic language, outdated terms, and cultural references that differ from contemporary texts, necessitating adaptations in annotation guidelines. Collaboration with Linguists and Literature Experts: Similar to KoCoNovel, datasets for other languages and literary traditions benefit from collaboration with linguists, literature experts, and native speakers to ensure accurate annotations and guidelines.

What insights can be gained by analyzing the relationship between the use of address terms and character development in Korean literature?

Analyzing the relationship between the use of address terms and character development in Korean literature can provide valuable insights: Social Hierarchy and Relationships: Address terms reflect social hierarchy and relationships in Korean society. Analyzing their usage can reveal power dynamics, familial ties, and social status among characters. Character Identity: The choice of address terms can signify character identity, personality traits, and emotional connections. Studying these terms can offer insights into character development and interactions. Cultural Context: Address terms are deeply rooted in Korean culture and tradition. Understanding their nuances can shed light on cultural values, norms, and customs depicted in literature. Narrative Dynamics: Changes in address terms throughout a story can indicate character growth, evolving relationships, or plot developments. Analyzing these changes can enhance the understanding of character arcs and narrative progression. Reader Engagement: The use of address terms can influence reader engagement and emotional resonance with characters. Examining how readers perceive and interpret these terms can provide insights into the effectiveness of character portrayal in Korean literature.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star