Su, Z., Wu, X., Zhou, W., Ma, G., & Hu, S. (2024). HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus. arXiv preprint arXiv:2309.02731v4.
This research paper addresses the challenge of detecting AI-generated text, particularly in semantic-invariant tasks where current detectors struggle. The authors aim to demonstrate the difficulty of this task and introduce a new dataset, HC3 Plus, to facilitate the development of more effective detection models.
The researchers first highlight the limitations of existing AI text detection datasets, which primarily focus on question-answering tasks. They then construct HC3 Plus, a comprehensive dataset encompassing translation, summarization, and paraphrasing tasks. They evaluate the performance of existing detectors on this dataset and propose a novel detection method based on instruction fine-tuning using the Tk-instruct model.
The study reveals that current AI text detectors struggle to effectively identify AI-generated text in semantic-invariant tasks. The proposed HC3 Plus dataset proves more challenging for these detectors, particularly in translation tasks where generated text closely resembles human-written text. The authors demonstrate that instruction fine-tuning models, specifically InstructDGGC, exhibit improved detection performance compared to traditional RoBERTa-based methods.
The research concludes that detecting AI-generated text in semantic-invariant tasks presents a significant challenge due to the semantic similarity between human and AI-generated content. The authors emphasize the need for specialized datasets like HC3 Plus and the exploration of advanced detection techniques like instruction fine-tuning to address this issue.
This research significantly contributes to the field of AI text detection by highlighting a critical limitation of existing methods and providing a valuable resource (HC3 Plus) for future research. The proposed instruction fine-tuning approach offers a promising direction for developing more robust and accurate AI text detectors.
The study acknowledges the limitations of using a specific version of ChatGPT (GPT-3.5-Turbo-0301) for dataset creation and suggests updating the dataset as ChatGPT evolves. Future research could explore the impact of ChatGPT iterations on detection performance and investigate alternative detection methods beyond instruction fine-tuning.
To Another Language
from source content
arxiv.org
Djupare frågor