Core Concepts
This study presents a comprehensive system that integrates facial emotion recognition, personalized music recommendation, and explainable AI techniques to enhance the user experience.
Abstract
The paper proposes a methodology that combines facial emotion detection, region of interest (ROI) analysis focusing on the eyes, and music recommendation based on the detected emotions. The key highlights are:
Facial Emotion Detection:
The system utilizes the ResNet50 deep learning model to accurately classify facial expressions into seven emotions: anger, disgust, fear, happiness, sadness, surprise, and neutral.
The model achieves an overall accuracy of 86% in emotion classification.
ROI (Eyes) Analysis:
The study focuses on the eyes as a crucial region for emotion recognition, extracting eye-specific features using a Haar cascade classifier.
Training the model on a specialized dataset of eye images further improves the performance in capturing subtle emotional cues.
Music Recommendation:
The system maps the detected emotions to a curated music dataset, generating personalized playlists that align with the user's emotional state.
This approach enhances the user experience by providing music that resonates with the user's current mood.
Explainable AI:
The study incorporates the GRAD-CAM technique to provide visual explanations for the model's predictions, enabling users to understand the reasoning behind the recommended content.
The heatmaps generated by GRAD-CAM highlight the facial regions that contribute most significantly to the emotion classification.
The proposed methodology demonstrates the effectiveness of integrating facial emotion recognition, ROI analysis, music recommendation, and explainable AI techniques to create a comprehensive and user-centric system. The results highlight the potential of this approach in various applications, such as personalized music streaming, emotion-aware user interfaces, and affective computing.
Stats
The dataset used for training the facial emotion recognition model consists of two components:
The FER dataset, which includes categorized facial expressions of various emotions.
Real images of different individuals expressing diverse emotions.
The music dataset contains a diverse collection of music tracks from various genres and styles, which are mapped to the detected emotions.
Quotes
"By focusing solely on the eyes, the model gained a deeper understanding of the specific eye-related cues and expressions associated with each emotion."
"The incorporation of GRAD-CAM for explainable AI provided insights into the model's decision-making process."