Core Concepts
TOPFORMER, a hybrid model that combines a Transformer-based backbone (RoBERTa) with Topological Data Analysis (TDA), can accurately attribute authorship of deepfake texts, even in the presence of imbalanced, multi-style datasets.
Abstract
The paper proposes TOPFORMER, a novel solution for accurately attributing the authorship of deepfake texts versus human-written texts. TOPFORMER combines a Transformer-based backbone (RoBERTa) with Topological Data Analysis (TDA) to capture both contextual representations (semantic and syntactic features) and the shape/structure of the data (linguistic structures).
The key highlights are:
Recent advances in Large Language Models (LLMs) have enabled the generation of high-quality deepfake texts that are non-trivial to distinguish from human-written texts.
The Authorship Attribution (AA) problem, which aims to not only detect if a text is a deepfake but also identify the specific LLM author, is more challenging than the Turing Test (TT) problem.
TOPFORMER outperforms state-of-the-art deepfake text attribution models on three realistic datasets (OpenLLMText, SynSciPass, Mixset) that reflect the current landscape of diverse writing styles and label imbalance.
TDA features extracted from the reshaped pooled output of the RoBERTa backbone complement the contextual representations, enabling TOPFORMER to capture both semantic/syntactic and structural linguistic patterns.
TOPFORMER performs well on datasets with multi-style labels, suggesting its robustness to heterogeneous data. It also performs comparably to the backbone model on more homogeneous datasets.
Stats
There are currently over 72K text generation models in the huggingface model repo.
The OpenLLMText dataset has 53K training, 10K validation, and 7.7K test samples across 5 labels (human, LLaMA, ChatGPT, PALM, GPT-2).
The SynSciPass dataset has 87K training, 10K validation, and 10K test samples across 12 labels (1 human, 11 deepfake text generators).
The Mixset dataset has 2.4K training, 340 validation, and 678 test samples across 8 labels (human, GPT-4, LLaMA, Dolly, ChatGLM, StableLM, ChatGPT-turbo, ChatGPT).