Concetti Chiave
The author argues that social class significantly affects the performance of NLP systems, highlighting disparities and advocating for more inclusive language technologies.
Sintesi
The content explores how social class influences language production and perception, emphasizing the need for NLP to consider socioeconomic status. It presents empirical evidence from a dataset of 95K utterances, revealing performance disparities based on socioeconomic class, ethnicity, and geographical differences. The study delves into lexical analysis, speech recognition, language modeling, and grammar correction to demonstrate the impact of social class on NLP tools.
Key points include:
- Historical background on social stratification in language by Labov.
- Analysis of linguistic markers of social class.
- Empirical study using a dataset of TV shows and movies annotated for demographics.
- Findings showing correlations between social class and NLP performance metrics.
- Discussion on ethical considerations and limitations of the study.
Statistiche
We annotate a corpus of 95K utterances from movies with social class, ethnicity, and geographical language variety.
Our dataset contains 95K utterances from 19 TV shows and movies.
Mean perplexity values per model: Mistral-7B (205.585), Zephyr-7B (302.057), Llama 2 (189.804).
Percentage of sentences corrected by models: T5 Grammar Correction (19.76%), CoEdit-large (35.94%), Flan-T5 (66.42%).
Citazioni
"We show empirically that NLP disadvantages less-privileged socioeconomic groups."
"Our findings highlight an important lack of flexibility of NLP tools."
"Social class should be carefully considered as a variable in NLP."