מושגי ליבה
This paper introduces a user-friendly web interface and a Python library to facilitate easy access and manipulation of the extensive linguistic information in the Sejong dictionary, with a focus on Korean verb subcategorization frames.
תקציר
The paper presents a comprehensive approach to unlocking the rich linguistic data in the Sejong dictionary, a major language resource for Korean. It introduces two key tools:
A web interface that provides intuitive access to verb information, including morphological, semantic, and syntactic details, as well as annotated sentence examples illustrating subcategorization frames.
A Python library (pySejongFrame) that enables efficient querying and processing of the Sejong dictionary data, supporting various loading methods and integration with existing NLP frameworks like NLTK.
The web interface organizes the Sejong dictionary data, allowing users to search for verbs, frames, arguments, and semantic roles, and view detailed information with annotated sentence examples. The Python library offers flexible loading options and querying capabilities, making it suitable for both corpus-based applications and linguistic research.
The authors also discuss their efforts to map subcategorization frames to corresponding sentence examples, providing a valuable resource for understanding verb-argument structures in Korean. Additionally, they outline plans to integrate other Korean verb lexicons, such as the Korean PropBank and FrameNet, to develop a comprehensive Korean VerbNet.
This work aims to enhance the accessibility and usability of the Sejong dictionary, a crucial language resource, for a wide range of users, from linguists to developers working on Korean language processing tasks.
סטטיסטיקה
The Sejong dictionary dataset contains 15,181 verbs with an average of 1.812 frames per verb.
The Korean PropBank has 2,749 verbs with an average of 1.408 frames per verb.
The NIKL Semantic Role Labeling (SRL) dataset refines 1,597 verbs from the Sejong dictionary and adds 2,063 new verbs, with an average of 1.593 frames per verb.
ציטוטים
"The Sejong dictionary has produced extensive datasets that describe Korean lexicon data in great detail."
"This structured dataset will serve as a foundation for identifying relationships between words, such as organizing and searching for verbs that share the same subcategorization frames."
"Our ultimate goal is to develop a comprehensive Korean VerbNet by systematically comparing verbs and their subcategorization frames across the Sejong verb dictionary, PropBank, and other resources."