Core Concepts
Large multimodal models are evaluated for their ability to detect social abuse in memes using the GOAT-Bench dataset, revealing shortcomings in safety awareness and the need for further advancements in artificial intelligence.
Abstract
The exponential growth of social media has led to an increase in online abuse through memes. Large multimodal models are tested on the GOAT-Bench dataset to assess their ability to identify hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Results show a deficiency in safety awareness among current models, emphasizing the importance of aligning AI with human values.
The study introduces GOAT-Bench, a comprehensive meme benchmark comprising over 6K varied memes focusing on implicit hate speech, sexism, and cyberbullying. The evaluation reveals that existing large multimodal models struggle with discerning nuanced forms of abuse present in memes. The research aims to contribute to enhancing safety insights in LMMs to prevent online social abuse escalation.
Various large multimodal models like GPT-4V, CogVLM, and LLaVA-1.5 are evaluated on tasks related to hatefulness, misogyny, offensiveness, sarcasm, and harmfulness using the GOAT-Bench dataset. Findings indicate disparities among models with GPT-4V showing the best overall performance but still exhibiting deficiencies in safety awareness.
Stats
GPT-4V achieves an overall macro-averaged F1 score of 70.29%.
LLaVA-1.5 exhibits strong capabilities for detecting hatefulness.
Current models exhibit deficiencies in safety awareness.
The top-performing GPT-4V shows insensitivity to various forms of implicit abuse.
Quotes
"No model achieved a perfect score on all tasks."
"Results highlight the need for continued advancements in LMM safety."