M4: Multi-Generator, Multi-Domain, and Multi-Lingual Black-Box Machine-Generated Text Detection Study
The study introduces a large-scale benchmark M4 dataset for detecting machine-generated texts across multiple generators, domains, and languages. The goal is to address the challenges of generalizing detectors well on unseen instances from different domains or language models.