Core Concepts
A semantics-based watermarking framework, SemaMark, is proposed to enhance the robustness of LLM-generated text detection against paraphrasing.
Abstract
The paper proposes a semantics-based watermarking framework, SemaMark, to enhance the robustness of detecting LLM-generated text against paraphrasing.
Key highlights:
- Existing watermarking methods based on token hashes are vulnerable to paraphrasing, as it can disrupt the matching between tokens and the partitioned vocabulary.
- SemaMark leverages the semantic meaning of token sequences as the seed for the partition function, as semantics are more likely to be preserved under paraphrasing.
- SemaMark uses a two-step approach to obtain stable semantic values: 1) weighted embedding pooling to aggregate semantics of previous tokens, and 2) discretization of the embeddings onto a Normalized Embedding Ring (NE-Ring).
- Contrastive learning is used to train the MLP that maps embeddings to the NE-Ring, ensuring a uniform distribution of semantic values to improve concealment.
- An offset detection method is proposed to enhance robustness at the boundaries of discrete semantic sections.
- Comprehensive experiments demonstrate the effectiveness and robustness of SemaMark under different paraphrasing techniques, outperforming baseline watermarking methods.
Stats
The paper does not provide any specific numerical data or statistics. It focuses on describing the proposed SemaMark framework and evaluating its performance compared to baseline methods.