This study explores the difficulty of detecting AI-generated text within human-AI collaborative hybrid texts. The research utilizes realistic hybrid texts from the CoAuthor dataset and highlights challenges in identifying authorship-consistent segments due to human-AI interactions. The study suggests practical tips for improving detection accuracy based on segment length assessment.
The content discusses the significance of detecting AI-generated text to prevent misuse of generative AI technologies, especially in educational contexts. It emphasizes concerns about deceptive content creation by advanced language models and the impact on students' writing skills and academic integrity.
Various approaches for segment detection and classification are explored, including TriBERT, SegFormer, Transformer2, DeBERTa-v3, BERT, SeqXGPT, RoBERTa, DistilBERT, GPT-3.5 (Fine-tuned), GPT-2, BERT (Token), DistilBERT (Token), and RoBERTa (Token). Results show that a two-step pipeline approach may outperform joint learning strategies.
The study analyzes the performance of different segment classifiers across groups with varying average segment lengths. It discusses the impact of missed boundaries on authorship consistency and highlights the challenge of short-text classification. Practical recommendations are provided for choosing optimal detection strategies based on segment length assessment.
To Another Language
from source content
arxiv.org
สอบถามเพิ่มเติม