This thesis investigates the use of natural language processing (NLP) technology to extract relevant information from job vacancy data, with a focus on the task of skill extraction (SE). The key challenges addressed include:
Data Annotation: The thesis explores methods for de-identifying privacy-related entities in job postings, as well as developing annotation guidelines and datasets for manually identifying skills in job descriptions. This includes creating a de-identification dataset called JOBSTACK and a skill extraction dataset called SKILLSPAN.
Modeling Occupational Skills: The thesis proposes several approaches to improve skill extraction and classification, including weak supervision using the ESCO taxonomy, taxonomy-driven pre-training of multilingual language models, and retrieval-augmented models that leverage multiple skill extraction datasets.
Linking Skills to Existing Resources: The thesis investigates methods for linking the extracted skills to the ESCO taxonomy, enabling standardization and further analysis of the labor market data.
Overall, the research aims to develop transparent language technology systems and data for the job market domain, providing valuable insights into labor market demands, the emergence of new skills, and the facilitation of job matching.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Mike Zhang lúc arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.18977.pdfYêu cầu sâu hơn