toplogo
Sign In

Swa Bhasha: A Rule-Based System for Transliterating Singlish Words into Sinhala


Core Concepts
A novel rule-based system that can transliterate Singlish words, with or without vowels, into native Sinhala words.
Abstract
The paper presents a system called "Swa Bhasha" that can transliterate Singlish words, which are a mix of English and Sinhala, into native Sinhala words. The system uses a rule-based approach along with NLP techniques and fuzzy logic to handle various Singlish typing patterns. The key highlights of the system are: It can transliterate Singlish words even without vowels by using a novel rule-based mapping system. This is a common issue faced by users when typing in Singlish. The system can handle Singlish words with vowels as well as words that have a reduced vowel count compared to the actual Sinhala word. The system uses a codified dictionary with unique numeric values assigned to each Sinhala word to facilitate the mapping process. Fuzzy logic-based implementation is used to match the Singlish word patterns with the native Sinhala words. The system achieved an 84% accuracy in word-level transliteration and a 92% accuracy in suggestion-level predictions, outperforming existing solutions like the "Helakuru" keyboard. The authors conclude that the "Swa Bhasha" system can significantly enhance the Sinhala users' experience when typing in Singlish by providing accurate transliteration into native Sinhala.
Stats
The system can transliterate Singlish words even without vowels with an 84% accuracy. The system can provide relevant Sinhala word suggestions with a 92% accuracy.
Quotes
"The 'Swa Bhasha' transliteration system has the ability to enhance the Sinhala users' experience while conducting the texting in Singlish to Sinhala." "These results revealed that the 'Swa Bhasha' transliteration system has the ability to enhance the Sinhala users' experience while conducting the texting in Singlish to Sinhala."

Key Insights Distilled From

by Maneesha U. ... at arxiv.org 04-23-2024

https://arxiv.org/pdf/2404.13350.pdf
Swa Bhasha: Message-Based Singlish to Sinhala Transliteration

Deeper Inquiries

How can the system be extended to handle Singlish sentences or paragraphs, rather than just individual words?

To extend the system to handle Singlish sentences or paragraphs, the system would need to incorporate natural language processing (NLP) techniques to analyze the context and structure of the input text. This would involve tokenizing the input text into individual words, identifying the grammatical structure, and understanding the relationships between words in the sentence or paragraph. Additionally, the system could utilize sequence-to-sequence models for machine translation to ensure accurate transliteration of longer text segments. By implementing these NLP techniques, the system can effectively handle Singlish sentences or paragraphs, providing accurate transliteration from Singlish to Sinhala at a larger text level.

What are the potential challenges in incorporating Sinhala grammar rules into the transliteration system?

Incorporating Sinhala grammar rules into the transliteration system can pose several challenges. Sinhala grammar is complex and includes various rules related to verb conjugation, noun declension, and sentence structure. Some potential challenges include: Morphological Complexity: Sinhala language has a rich morphology with intricate rules for word formation and inflection. Adapting these rules into the transliteration system would require a deep understanding of Sinhala grammar. Word Order: Sinhala follows a subject-object-verb (SOV) word order, which is different from English. Ensuring that the transliteration system maintains the correct word order while converting Singlish sentences can be challenging. Agglutination: Sinhala is an agglutinative language, where multiple morphemes are added to a root word to convey meaning. Handling this agglutination process accurately in the transliteration system is crucial for maintaining the integrity of the Sinhala language. Idiomatic Expressions: Sinhala language includes many idiomatic expressions and colloquialisms that may not have direct equivalents in Singlish. Transliterating these expressions while preserving their intended meaning can be a challenge. By addressing these challenges and incorporating Sinhala grammar rules effectively, the transliteration system can provide more accurate and contextually relevant translations.

How can the system be adapted to handle regional variations in Singlish usage across different parts of Sri Lanka?

Adapting the system to handle regional variations in Singlish across different parts of Sri Lanka would require a comprehensive dataset that captures the diverse linguistic nuances and variations present in each region. Here are some strategies to address regional variations: Dialect-specific Data: Collecting data specific to different regions and dialects of Singlish in Sri Lanka can help the system understand and transliterate regional variations accurately. Machine Learning Models: Implementing machine learning models that can learn from regional variations in the data and adapt the transliteration process accordingly. User Feedback Mechanism: Incorporating a user feedback mechanism where users can provide input on regional variations or suggest corrections can help improve the system's accuracy over time. Collaboration with Linguists: Collaborating with linguists and language experts from different regions can provide valuable insights into regional variations and help fine-tune the transliteration system. By considering these strategies and actively addressing regional variations, the system can better cater to the diverse Singlish usage across different parts of Sri Lanka.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star