This thesis investigates the use of natural language processing (NLP) technology to extract relevant information from job vacancy data, with a focus on the task of skill extraction (SE). The key challenges addressed include:
Data Annotation: The thesis explores methods for de-identifying privacy-related entities in job postings, as well as developing annotation guidelines and datasets for manually identifying skills in job descriptions. This includes creating a de-identification dataset called JOBSTACK and a skill extraction dataset called SKILLSPAN.
Modeling Occupational Skills: The thesis proposes several approaches to improve skill extraction and classification, including weak supervision using the ESCO taxonomy, taxonomy-driven pre-training of multilingual language models, and retrieval-augmented models that leverage multiple skill extraction datasets.
Linking Skills to Existing Resources: The thesis investigates methods for linking the extracted skills to the ESCO taxonomy, enabling standardization and further analysis of the labor market data.
Overall, the research aims to develop transparent language technology systems and data for the job market domain, providing valuable insights into labor market demands, the emergence of new skills, and the facilitation of job matching.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Mike Zhang at arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.18977.pdfDeeper Inquiries