Core Concepts
Enhancing the ITI method leads to significant improvements in the generalization capabilities of Large Language Models (LLMs).
Abstract
Abstract:
Large Language Models (LLMs) face challenges in returning false information.
Paradigm of Inference-Time-Intervention (ITI) is explored to improve LLMs.
Introduction of Non-Linear ITI (NL-ITI) shows promising results on various benchmarks.
Method:
ITI involves probing accuracy evaluation and Mass Mean Shift vectors application.
NL-ITI enhances probing model capacity and expands token context for intervention.
Experiments:
Evaluation metrics include MC1, MC2, Cross Entropy (CE), and Kullback-Leibler divergence (KL).
NL-ITI outperforms ITI on multiple benchmarks and shows better generalization capabilities.
Conclusions:
NL-ITI significantly improves ITI method performance across various benchmarks.
Future research directions include exploring NL-ITI in different scenarios and in combination with other methods.
Stats
NL-ITI reports around 14% MC1 metric improvement with respect to baseline ITI results.
NL-ITI achieves around 18% MC1 improvement over baseline LLaMA2-7B on Business Ethics subdomain.
NL-ITI shows better MC accuracy compared to ITI for given levels of intervention invasiveness.
Quotes
"NL-ITI outperforms ITI on 4 major benchmarks, including TruthfulQA."
"NL-ITI notably increases capabilities of LLM in elementary mathematics subdomain."