Core Concepts
GPST is an unsupervised syntactic language model that outperforms existing models in various tasks, demonstrating its potential as a foundational architecture for large language models.
Abstract
GPST introduces Generative Pretrained Structured Transformers (GPST) as an unsupervised SLM at scale.
The model consists of two components: a usual SLM supervised by uni-directional language modeling loss and a composition model supervised by bi-directional language modeling loss.
GPST pre-trained on OpenWebText corpus with 9 billion tokens showcases superiority over GPT-2 in multiple tasks.
The representation surrogate enables joint parallel training of all components efficiently.
In experiments, GPST demonstrates significant acceleration in training and outperforms existing models in left-to-right grammar induction and language understanding/generation benchmarks.
Introduction:
The paper introduces Generative Pretrained Structured Transformers (GPST) as an unsupervised syntactic language model at scale.
Methodology:
Generative Model: GPST generates sentences and parse trees using GEN and COMP actions along with a stack to maintain sub-trees during generation.
Unsupervised Training: E-step involves inducing parse trees using the inside-outside algorithm, while M-step updates parameters based on the induced tree.
Inference: Improved word-level search is used for top-k random sampling during inference.
Experiments:
Understanding Tasks: Evaluation on GLUE benchmark shows GPST's superiority over GPT-2 across various NLP tasks.
Generation Tasks: Results on summarization and syntactic generalization tasks demonstrate GPST's potential in language generation abilities.
Grammar Induction: GPST achieves comparable performance with bi-directional inside algorithm, surpassing previous generative parsing baselines.
Stats
GPST circumvents limitations of previous SLMs such as relying on gold trees and sequential training.
Pre-trained on OpenWebText corpus with 9 billion tokens showcasing superiority over GPT-2 in various tasks.