Large language models can detect arguments that would be particularly persuasive to individuals with specific demographics or beliefs, indicating their potential to generate targeted misinformation and propaganda.
인도어 언어 LLM 개발을 위한 청사진 및 자원 소개
RomanSetu proposes using romanized text to enhance the efficiency and performance of Large Language Models for non-English languages.
Magahi-Hindi-English code-mixed dataset MaCmS introduced for sentiment analysis, highlighting language preferences and challenges in sentiment analysis for low-resourced languages.
Emojinize introduces a method to translate text into emoji, enhancing communication and comprehension.
Creating a diverse and representative speech dataset for Indian languages to support speech technology development.
NusaBERT enhances multilingual understanding in Indonesia by incorporating regional languages and dialects, paving the way for future natural language research.
Increasing high-quality Yorùbá speech data for Text-to-Speech and Automatic Speech Recognition tasks.
Gender bias persists in commercial machine translation systems, impacting the accuracy of gender translations.
Peacock introduces Arabic MLLMs for visual reasoning tasks and dialectal potential.