AI-Generated Text Boundary Detection: Identifying the Transition from Human-Written to Machine-Generated Content
This work addresses the task of detecting the boundary between human-written and machine-generated parts in texts that combine both. The authors evaluate several approaches, including perplexity-based methods, topological data analysis, and fine-tuned language models, on the RoFT and RoFT-chatgpt datasets. They find that perplexity-based classifiers outperform fine-tuned language models in cross-domain and cross-model settings, and analyze the properties of the data that influence the performance of different detection methods.