Leveraging the common sense reasoning capabilities of Vision-and-Large-Language Models (VLLMs), this work proposes a novel two-stage approach to enhance emotion classification in visual context without introducing complex training pipelines.
音声と視覚データのシナジーを活用し、AU検出の精度向上を図る新しい手法を提案。
Prompt sensitivity analysis in ChatGPT for affective computing tasks reveals the impact of different prompts and generation parameters on model performance.
Proposing a novel audio-visual method for compound expression recognition based on emotion probability fusion and rule-based decision-making.
Introducing ALFRED, a novel multimodal neural framework for meme emotion detection, outperforming existing baselines by 4.94% F1.