Comprehensive Vietnamese Multimodal Aspect-Category Sentiment Analysis: A New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework
A new Vietnamese multimodal dataset, ViMACSA, with fine-grained annotations for both text and images, and a novel Fine-Grained Cross-Modal Fusion (FCMF) framework that effectively learns intra- and inter-modality interactions to improve multimodal aspect-category sentiment analysis.