Core Concepts
Each textual sequence to be forgotten should be treated differently when being unlearned based on its degree of memorization within the large language model.
Abstract
The paper presents a fresh perspective on unlearning memorized data in large language models (LLMs). It argues that each textual sequence to be forgotten should be treated differently based on its degree of memorization within the LLM. The authors make the following key contributions:
They introduce a new metric to quantify the success of forgetting textual sequences in LLMs, which focuses on example-level memorization rather than aggregate memorization across the forget-set examples.
They devise a new Membership Inference Attack (MIA) for unlearning memorized data in LLMs, which shows that existing state-of-the-art unlearning algorithms are prone to privacy violations due to the presence of under- and over-memorized subpopulations of data points after unlearning.
They introduce two new unlearning algorithms, Selective Gradient Ascent (SGA) and Task Arithmetic for Unlearning (TAU), which provide fine-grained, per-example unlearning control.
They conduct a comprehensive performance evaluation across an extensive suite of NLP tasks, identifying the best unlearning solutions under different scales in model capacities and forget set sizes, and quantifying the gains of the new approaches.
The paper demonstrates that the new algorithms, SGA and TAU, outperform existing state-of-the-art unlearning solutions in terms of both model utility and privacy metrics.
Stats
"LLMs have been shown to (i) memorize and (ii) emit memorized training data at generation time, which causes privacy and copyright problems."
"GPT-3 has been shown to generate PII information verbatim, raising significant concerns given its wide commercial usage."
"Memorization is proportional to the number of model parameters."
Quotes
"LLMs have been found to memorize training textual sequences and regurgitate verbatim said sequences during text generation time."
"Massive training data and a (lengthy) training process allow LLMs to establish factual associations and memorize language semantics and grammar."
"Memorized training data points can also decrease the quality of the model by linking tokens which do not have a general meaning or a justifiable factual association, but only belong to an entity or an individual."