Core Concepts
The core message of this paper is to introduce a general statistical framework for analyzing the
efficiency of watermarks in large language models, and to derive the optimal detection rule for the
Gumbel-max watermark that maximizes the class-dependent efficiency rate.
Abstract
The paper introduces a statistical framework for analyzing the efficiency of watermarks in large
language models (LLMs). The key points are:
The problem of detecting watermarked text is formulated as a hypothesis testing problem, with the null hypothesis being human-written text and the alternative being text generated by a watermarked LLM.
The framework leverages the concept of a pivotal statistic, which has the same distribution under the null hypothesis regardless of the unknown next-token prediction (NTP) distributions of the LLM. This allows for controlling the Type I error rate.
The framework then evaluates the Type II error rate (false negative rate) asymptotically using large deviation theory, and introduces the notion of class-dependent efficiency to handle the challenge of unknown and varying NTP distributions.
For the Gumbel-max watermark, the paper derives the optimal detection rule that maximizes the class-dependent efficiency rate. This optimal rule has a closed-form expression and is shown to outperform existing detection approaches.
The paper also analyzes the uniqueness of the Gumbel-max decoder and shows that it is essentially the only unbiased decoder satisfying certain natural properties.
Numerical experiments corroborate the theoretical findings and demonstrate the effectiveness of the derived optimal detection rule.