Conceitos Básicos
The proposed Min-K%++ method normalizes the token probability with statistics of the categorical distribution over the whole vocabulary, which accurately reflects the relative likelihood of the target token compared with other candidate tokens. This provides a more informative signal for detecting pre-training data compared to the existing state-of-the-art Min-K% method.
Resumo
The paper proposes an enhanced version of the Min-K% method, called Min-K%++, for detecting pre-training data from large language models (LLMs).
Key highlights:
- Min-K% measures the raw token probability, which may not be the most informative signal. Instead, Min-K%++ normalizes the token probability with statistics of the categorical distribution over the whole vocabulary.
- Theoretically, Min-K%++ is shown to be related to the negative Hessian trace of the token log likelihood, which is implicitly optimized during LLM training.
- Empirically, Min-K%++ outperforms the state-of-the-art Min-K% by 6.2% to 10.5% in detection AUROC on the WikiMIA benchmark, and performs on par with reference-based methods on the more challenging MIMIR benchmark.
- Ablation studies demonstrate the robustness of Min-K%++ to hyperparameter selection, and the contributions of the different normalization factors.
- An online detection setting is introduced, where Min-K%++ again achieves the best performance.
Estatísticas
The paper does not contain any key metrics or important figures to support the author's key logics.
Citações
The paper does not contain any striking quotes supporting the author's key logics.