Core Concepts
Proposing a unified pre-trained language model incorporating heterogeneous knowledge for all forms of text.
Abstract
The article introduces a heterogeneous knowledge language model (HKLM) that captures multi-format text relationships. It discusses pre-training methods, downstream tasks, and experimental results in the tourism domain.
Directory:
Introduction to PLMs Expansion Methods
Importance of Multi-Format Text in Pre-Training
Modeling Multi-Format Text Challenges and Solutions
Training Mechanism for HKLM
Fine-Tuning TravelBERT for Tourism NLP Tasks
Experiments on Pre-training and Downstream Datasets
Stats
The results show that the approach outperforms plain text pre-training using only 1/4 of the data.
The HKLM achieves performance gains on the XNLI dataset.