The DesignProbe benchmark is designed to comprehensively evaluate the performance of Multimodal Large Language Models (MLLMs) in understanding graphic design. It comprises a total of eight tasks across two distinct levels: the design element level and the overall design level.
At the design element level, the tasks focus on evaluating the models' abilities in both recognizing visual design components (color, font, layout) and understanding their semantic meanings. The color recognition task assesses the model's capability to identify the primary colors in a design, while the color meaning task evaluates the model's understanding of the symbolic associations of different color palettes. Similarly, the font extraction and font style tasks test the model's abilities in recognizing font faces and comprehending their stylistic attributes. The layout-related tasks examine the model's spatial awareness, including detecting negative space and identifying the visual center of a design.
At the overall design level, the benchmark includes tasks that assess the model's holistic understanding of design. The style classification task requires the model to identify the overall visual style of a given design, while the visual metaphor task challenges the model to comprehend the abstract and creative use of design elements to convey deeper meanings.
To support the evaluation, the authors have curated and re-annotated multiple datasets, and introduced a new dataset for layout recognition. The benchmark is evaluated using GPT-4 as the automatic evaluator, which demonstrates comparable performance to human annotators.
The authors also conduct extensive experiments to explore the impact of prompt refinement and the incorporation of supplementary design knowledge on the models' performance. The results reveal that better-performing models tend to be more robust to prompt variations, and that adding visual examples to the prompts can lead to significantly greater performance improvements compared to textual descriptions alone.
The DesignProbe benchmark sets a new standard for evaluating MLLMs' capabilities in the domain of graphic design understanding, paving the way for future research and advancements in this field.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы