Near duplicate subwords in language model vocabularies can negatively impact training efficiency, but merging them may not yield the expected performance improvements.
A novel DPO-based approach, LD-Align, that aligns a fine-tuned large language model with a high-quality supervised fine-tuning dataset without requiring any additional human annotations or relying on a more powerful language model.
CodecLM, a framework that leverages large language models as codecs to generate high-quality synthetic data tailored for aligning target language models with diverse instruction distributions.
An attacker can manipulate the behavior of a language model trained with RLHF by injecting a small amount of poisoned preference data into the training process, causing the model to generate more text containing a target entity in a desired sentiment.
Recursive training on synthetic data generated from previous language models inevitably leads to model collapse, where the trained models lose diversity and converge to Dirac distributions. Incorporating a sufficient amount of real data can help mitigate this issue.
The key idea of ROPO is to dynamically assign conservative gradient weights to response pairs with high label uncertainty, based on the log-likelihood margins between the responses. This weighting strategy effectively suppresses the gradients of noisy samples and ensures that the expected risk maintains the same gradient direction under both noisy and noise-free conditions.
Developing a method called QUOTE-TUNING that aligns large language models to quote verbatim from high-quality pre-training data, enabling more verifiable and truthful generations.
Direct Nash Optimization (DNO) is a provable and scalable algorithm that optimizes large language models to align with general preferences, outperforming reward-based approaches and achieving state-of-the-art results.
CONSCENDI is a data generation pipeline that leverages scenario-guided conversations and contrastive examples to train smaller language models as effective guardrail models for virtual assistants. These guardrail models can identify rule violations in conversations with high accuracy, outperforming larger language models like GPT-4.
Training large language models (LLMs) directly over highly compressed neural text can confer advantages in training and serving efficiency, as well as easier handling of long text spans. However, strong compression tends to produce opaque outputs that are not well-suited for learning by standard LLMs. The authors propose a novel compression technique called Equal-Info Windows that enables effective learning over neurally compressed text, outperforming byte-level baselines on perplexity and inference speed benchmarks.