The paper investigates the impact of prompts on the accuracy of zero-shot detectors for identifying AI-generated text. It proposes two detection methods: white-box detection, which leverages the prompts used to generate the text, and black-box detection, which operates without prompt information.
The key findings are:
Extensive experiments demonstrate a consistent decrease of 0.1 or more in detection accuracy for existing zero-shot detectors when using black-box detection without prompts, compared to white-box detection with prompts.
The Fast series detectors (FastDetectGPT, FastNPR) and Binoculars exhibit more robustness to the impact of prompts compared to other methods.
Increasing the replacement ratio and sample size in the Fast series detectors can help mitigate the decrease in detection accuracy, but the improvement plateaus at around 10 samples, with a maximum AUC of approximately 0.8, which may not be sufficient for practical applications.
The paper hypothesizes that any act that fails to replicate the likelihood during language generation could undermine the detection accuracy of zero-shot detectors relying on likelihood from next-word prediction.
The findings have implications for the development of more robust zero-shot detectors, potentially by combining likelihood-based approaches with other methods, such as those based on Intrinsic Dimension.
翻译成其他语言
从原文生成
arxiv.org
更深入的查询