Conceptos Básicos
Language models exhibit alarming inconsistencies in their predictions when dealing with simplified text inputs, with prediction change rates up to 50% across multiple languages and tasks.
Resumen
This study investigates the coherence of pre-trained language models when processing simplified text inputs. The authors compiled a set of human-created or human-aligned text simplification datasets across English, German, and Italian, and tested the prediction consistency of various pre-trained classifiers on the original and simplified versions.
The key findings are:
- Across all languages and models tested, the authors observed high prediction change rates, with up to 50% of samples eliciting different predictions between the original and simplified versions.
- The prediction change rates tend to increase with the strength of simplification, indicating that more extensive text alterations make the models more susceptible to inconsistent behavior.
- The authors explored factors that may influence the models' coherence, such as edit distances, task complexity, and simplification operations. While these factors play a role, the models still exhibit concerning levels of incoherence.
- Even state-of-the-art language models like GPT-3.5 are not robust to text simplification, showing similar prediction change rates as smaller, task-specific models.
The authors conclude that the lack of simplified language data in pre-training corpora is a key factor behind the models' inconsistent behavior. They encourage further research to improve model coherence on simplified inputs, as this can have significant implications for accessibility and the robustness of language applications.
Estadísticas
"Researchers presented their evidence at a conference." (original)
"Researchers presented their evidence at a science meeting." (simplified)
Citas
"If not promptly addressed, simplified inputs can be easily exploited to craft zero-iteration model-agnostic adversarial attacks with success rates of up to 50%."