Core Concepts
The authors propose SVGEditBench, a benchmark dataset for quantitatively assessing the ability of Large Language Models (LLMs) to edit Scalable Vector Graphics (SVG) content.
Abstract
The authors present SVGEditBench, a benchmark dataset for evaluating the SVG editing capabilities of Large Language Models (LLMs). The key points are:
SVG is a popular vector graphics format that can be directly processed by LLMs, as it is represented in XML text.
The authors selected 6 specific SVG editing tasks, including changing color, adding contours, compression, flipping upside-down, adjusting transparency, and cropping.
They created prompts and ground truth answers for each task, using 1366 SVG images from the Twemoji dataset.
The authors evaluated GPT-4 and GPT-3.5 on the benchmark, finding that GPT-4 outperformed GPT-3.5 across all tasks, both quantitatively (using metrics like MSE and compression ratio) and qualitatively.
The benchmark provides a standardized way to assess and compare the SVG editing capabilities of different LLMs, which is an important emerging application area for these models.
Stats
The SVG code length can be reduced by 94.5% for GPT-4 and 96.1% for GPT-3.5 in the Compression task.
The Mean Squared Error (MSE) between the edited and ground truth images is lower for GPT-4 compared to GPT-3.5 in all tasks except Compression.
Quotes
"GPT-4 outperformed GPT-3.5 in all six editing tasks. Both quantitative and qualitative experiments confirmed this tendency."
"GPT-4 could reflect the instructions to the output more appropriately. GPT-3.5 often redrew the paths unnecessarily, resulting in significant image corruption."