핵심 개념
Large multimodal models can struggle to detect and interpret unusual in-game events, presenting a new challenge for the AI community.
초록
The paper introduces GlitchBench, a novel benchmark for evaluating the capabilities of large multimodal models (LMMs) in detecting and interpreting video game glitches. Glitches are unexpected frames that occur within a game due to software bugs, player actions, or unanticipated interactions between game elements.
The key highlights and insights are:
GlitchBench contains 593 glitch and 330 glitch-free screens from 205 games, covering a wide range of unusual scenarios.
The benchmark evaluates 11 state-of-the-art LMMs, including GPT-4V and LLaVA, on their ability to detect and describe glitches.
LMMs perform better at detecting glitches that violate simple physical laws (e.g., a car flying in the air) than more subtle glitches (e.g., human limbs in an implausible pose).
The best-performing model, GPT-4V, achieves 43.4% accuracy on the benchmark, leaving a headroom of 30-35% for future improvements.
The authors find that the performance of models on GlitchBench does not correlate well with their performance on existing multimodal benchmarks, highlighting the need for real-world, task-specific evaluations.
통계
The video game industry boasts an estimated annual revenue of USD 217 billion with a total of 3.2 billion gamers worldwide in 2022.
GlitchBench contains 593 glitch and 330 glitch-free screens from 205 games.
인용구
"A holy grail of game quality assurance is to build a general glitch detector that works for any game of any genre and mechanics."
"Testing LMMs on GlitchBench may yield important findings not only to the game industry but also to the Artificial Intelligence (AI) community because glitch detection requires a combination of knowledge and understanding of image aesthetics, computer graphics, physics and commonsense reasoning."