為了應對大型語言模型 (LLM) 生成文本的來源追蹤和安全問題,本文提出了一種名為 Bileve 的新型雙層簽章方案,旨在有效抵禦偽造攻擊,確保文本來源的可靠性。
대규모 언어 모델(LLM)에서 생성된 텍스트의 출처를 안정적으로 식별하고 악의적인 조작을 방지하기 위해 Bileve라는 새로운 워터마킹 기법을 제안합니다. Bileve는 세분화된 서명과 거친 신호를 결합한 이중 레벨 서명 체계를 사용하여 텍스트 무결성을 검증하고 워터마크 제거 공격에 대한 견고성을 향상시킵니다.
大規模言語モデル(LLM)が生成したテキストの真正性を保証し、悪意のあるなりすまし攻撃から保護するため、バイレベル署名を用いた新たな電子透かし技術「Bileve」を提案する。
Bileve is a novel watermarking scheme for large language models (LLMs) that uses a bi-level signature to robustly identify the source of machine-generated text while defending against various spoofing attacks, including a newly identified semantic manipulation attack.
Watermarking presents a promising solution for addressing intellectual property rights and misinformation concerns associated with large language models (LLMs) by embedding traceable signatures in both the models themselves and their generated content.
While watermarking large language models (LLMs) is a promising approach to address concerns about misuse, it comes with significant trade-offs in downstream performance across various NLP tasks, impacting their practical utility.
MARKLLM is an open-source toolkit designed to simplify the implementation, visualization, and evaluation of various large language model (LLM) watermarking algorithms, aiming to promote responsible LLM use and mitigate potential misuse.
POSTMARK is a new technique for watermarking text generated by large language models (LLMs) that is robust against paraphrasing attacks and can be applied even without access to the model's internal workings.
WAPITI is a novel method for watermarking fine-tuned open-source large language models (LLMs) that overcomes the limitations of previous techniques by directly integrating watermark-related parameters from a base model to its fine-tuned counterparts, ensuring both watermark detectability and preservation of fine-tuned capabilities.
This paper proposes a novel theoretical framework for watermarking Large Language Models (LLMs) that jointly optimizes both the watermarking scheme and detector to maximize detection performance while controlling distortion and considering robustness against adversarial attacks.