Conceptos Básicos
AIpom, a novel method for human-machine mixed text detection, leverages a pipeline of decoder and encoder models to accurately identify the boundary between human-written and machine-generated text.
Resumen
The paper presents AIpom, a system designed to detect the boundary between human-written and machine-generated text as part of SemEval-2024 Task 8, Subtask C. The proposed approach utilizes a two-stage pipeline that combines predictions from an instruction-tuned decoder-only model and encoder-only sequence taggers.
Key highlights:
- The decoder model is fine-tuned to output only the machine-generated part of the input text, while the encoder models are trained to label each token as human-written or machine-generated.
- The pipeline involves fine-tuning the decoder, using its predictions to fine-tune the first encoder, and then fine-tuning a second encoder on a mixture of the original training data and the decoder's predictions.
- The final prediction is obtained by aggregating the outputs of the two encoder models.
- Ablation studies confirm the benefits of pipelining encoder and decoder models, particularly in terms of improved performance compared to using either model individually.
- The AIpom system achieves the second-best performance on the official leaderboard, with a Mean Absolute Error (MAE) of 15.94 on the evaluation set.
- The authors also develop a better-performing solution with an MAE of 15.21 after the official evaluation phase.
- The study highlights the importance of addressing domain shift issues, as there is a significant score disparity between the development and official evaluation sets.
Estadísticas
The M4 corpus consists of human-written and machine-generated texts in six languages (English, Chinese, Russian, Urdu, Indonesian, and Arabic) across various domains, ranging from Wikipedia to academic peer reviews.
The training, development, and official evaluation sets contain 3649, 505, and 11123 dataset instances, respectively.
Citas
"The boundary detection setup aligns with common user scenarios for applying generative LMs in practice, e.g., text continuation, creative writing, and story generation."
"Employing the pipeline of decoder and encoder models proves to be an effective solution."
"Future efforts should focus on enhancing the AIpom robustness with respect to the text domain and text generator."