Kavathekar, I., Rani, A., Chamoli, A., Kumaraguru, P., Sheth, A., & Das, A. (2024). Counter Turing Test (CT2): Investigating AI-Generated Text Detection for Hindi -- Ranking LLMs based on Hindi AI Detectability Index (ADIhi). arXiv preprint arXiv:2407.15694v2.
This paper investigates the effectiveness of existing AGTD techniques for the Hindi language and proposes a new metric, ADIhi, to rank LLMs based on the detectability of their Hindi text outputs.
The authors curated a dataset of human-written and AI-generated Hindi news articles (AGhi) using headlines from BBC Hindi and NDTV as prompts for 26 different LLMs. They then evaluated five recently proposed AGTD techniques: ConDA, J-Guard, RADAR, RAIDAR, and Intrinsic Dimension Estimation. Based on the performance of these techniques, they proposed the ADIhi as a metric to assess the detectability of LLM-generated Hindi text.
The study highlights the limitations of current AGTD techniques for Hindi and emphasizes the need for more robust and language-specific detection methods. The proposed ADIhi provides a valuable benchmark for evaluating and comparing the detectability of different LLMs.
This research contributes to the growing field of AI-generated text detection by focusing on a less-studied language, Hindi. The proposed ADIhi offers a practical tool for researchers and practitioners to assess the evolving capabilities of LLMs and the challenges in detecting their outputs.
The study acknowledges limitations regarding the exploration of temperature hyperparameters, text consistency in experiments, temporal limitations of the dataset, generalization to other languages, and the dynamic nature of AGTD techniques. Future research could address these limitations and explore new approaches for detecting AI-generated text in Hindi and other languages.
To Another Language
from source content
arxiv.org
ข้อมูลเชิงลึกที่สำคัญจาก
by Ishan Kavath... ที่ arxiv.org 10-08-2024
https://arxiv.org/pdf/2407.15694.pdfสอบถามเพิ่มเติม