toplogo
Iniciar sesión

Efficient Kathakali Hand Gesture Recognition Using Minimal Training Data


Conceptos Básicos
A vector-similarity-based approach using pose estimation can efficiently recognize Kathakali hand gestures (mudras) with minimal training data.
Resumen
The paper presents a novel approach for recognizing Kathakali hand gestures (mudras) that can achieve good performance even with a very small training dataset. The key highlights of the methodology are: It leverages existing pose estimation technology (Mediapipe) to obtain 3D coordinates of hand landmarks, which are then normalized to create a vector representation of each mudra class. These normalized vectors are stored in a vector database, and new test samples are classified by finding the closest match through Euclidean distance comparison. The approach can work with as little as 1 or 5 training samples per class, with a slight reduction in accuracy compared to using larger datasets. This makes it highly adaptable to domains with data scarcity, such as traditional art forms. Experiments were conducted on publicly available datasets for Kathakali and Bharatanatyam mudras, as well as a new 24-class Kathakali mudra dataset developed as part of this work. The proposed method achieved similar or better performance compared to deep learning-based approaches, while being more data-efficient. The system can work with full-body images or videos, not just cropped hand images, and is designed to be easily deployable in real-time applications. The authors also discuss future directions, such as extending the approach to Kathakali word recognition and adapting it to other sign language recognition tasks.
Estadísticas
Pose estimation can provide 3D coordinates of 21 hand landmarks, resulting in a 63-dimensional feature vector. The Kathakali mudra dataset developed as part of this work contains 24 classes with 8 participants, different angles, and varying zoom and lighting conditions. Experiments were conducted with training set sizes ranging from 1 sample per class to 80% of the dataset.
Citas
"Our approach aims for the most cost and resource-effective development, by building on top of existing technologies, when the general trend in AI is the opposite, requiring large amounts of data and computational capabilities." "The focus area we have chosen is the Indian classical art form Kathakali, but the approaches we put forth can be easily adapted in other similar use cases like different dance forms or even be extended to processing sign languages."

Ideas clave extraídas de

by Kavitha Raju... a las arxiv.org 04-18-2024

https://arxiv.org/pdf/2404.11205.pdf
Kathakali Hand Gesture Recognition With Minimal Data

Consultas más profundas

How could the proposed approach be extended to recognize Kathakali words or phrases, which involve sequences of mudras?

To extend the proposed approach to recognize Kathakali words or phrases, which involve sequences of mudras, a sequential modeling technique can be employed. Instead of treating each mudra in isolation, the system can be trained to recognize patterns and transitions between mudras to predict the sequence of gestures that form a word or phrase. This can be achieved by implementing a recurrent neural network (RNN) or a transformer model that can capture the temporal dependencies between mudras. The system would need to be trained on annotated sequences of mudras representing words or phrases in Kathakali performances. By considering the context and order of mudras, the model can learn to predict the most likely sequence of gestures corresponding to a particular word or phrase in Kathakali.

What are the potential challenges and considerations in adapting this method to sign language recognition tasks in different cultural contexts?

Adapting this method to sign language recognition tasks in different cultural contexts poses several challenges and considerations. One key challenge is the diversity of sign languages, each with its own vocabulary, grammar, and syntax. The system would need to be trained on annotated data specific to the target sign language, considering the unique gestures and expressions used in that cultural context. Additionally, variations in hand shapes, movements, and facial expressions across different sign languages would require a robust and adaptable model that can generalize well to unseen data. Cultural nuances and context-specific meanings associated with gestures in different sign languages must also be taken into account during training to ensure accurate recognition. Furthermore, the availability of annotated data for training the model in diverse sign languages may vary, requiring strategies to address data scarcity and ensure the system's effectiveness across different cultural contexts.

What other types of traditional art forms or cultural heritage domains could benefit from this data-efficient approach to gesture recognition, and how might the methodology need to be adjusted for those applications?

This data-efficient approach to gesture recognition can benefit various traditional art forms and cultural heritage domains beyond Kathakali. For example, other Indian classical dance forms like Bharatanatyam, Kuchipudi, or Odissi, which also involve intricate hand gestures, could leverage this methodology for mudra recognition. Additionally, traditional martial arts, ceremonial rituals, or folk dances that incorporate symbolic gestures could also benefit from this approach. To adapt the methodology for these applications, the training data would need to be tailored to the specific gestures and movements characteristic of each art form. The pose estimation and normalization techniques may need to be adjusted to accommodate the unique hand shapes, body postures, and movements associated with different art forms. Furthermore, the database of known gestures would need to be expanded to include a diverse range of gestures relevant to the particular cultural heritage domain being targeted. By customizing the training data and parameters to suit the requirements of each art form or cultural context, the methodology can be effectively applied to a wide range of traditional practices for gesture recognition and interpretation.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star