Keskeiset käsitteet
The use of religious texts in Natural Language Processing (NLP) raises important ethical considerations that go beyond model biases, including data provenance, cultural contexts, and potential use in proselytism.
Tiivistelmä
This position paper examines the use of religious texts, such as the Bible and Quran, in Natural Language Processing (NLP) research. It finds that thousands of NLP papers have utilized these texts, often due to their availability, convenience, and multilingual nature. However, the paper argues that the ethical implications of this practice have not been sufficiently addressed.
The paper first provides relevant background on religion, the relationship between academic linguistics and missionary linguistics, and an empirical study of the use of sacred texts in the ACL Anthology. It then discusses various ethical considerations from multiple perspectives, including ethical theories, AI principles, cultural standpoints, and the concerns of marginalized communities.
The key ethical considerations include:
- Consequentialist concerns about the potential harms and benefits of using religious texts in NLP applications.
- Deontological questions about the appropriate processes and relationships involved in the creation and use of datasets containing sacred texts.
- Risks related to AI principles such as safety, privacy, bias, fairness, accountability, and transparency.
- The importance of acknowledging researcher positionality and the cultural contexts of religious texts, especially for marginalized linguistic and religious communities.
- Concerns about how the use of religious texts in NLP may be complicit with colonial and proselytizing projects that violate the rights of Indigenous peoples to maintain their cultures.
The paper concludes by making several recommendations for the NLP community, including:
- Discussing ethical considerations more extensively in NLP papers using religious texts.
- Considering a broader range of ethical theories beyond just utilitarianism.
- Delving into the domain-specific risks and biases when using religious texts.
- Situating NLP work within cultural contexts and acknowledging researcher positionality.
- Attending more closely to the concerns of marginalized linguistic and religious communities.
Tilastot
"the Bible is one of the most familiar documents"
"the Quran is frequently used in NLP"
Lainaukset
"a particular string of speech may be viewed as data by a researcher but as sacred incantation by language users"
Holton et al. (2022)