Manipulating Energy Consumption and Latency of Multi-modal Large Language Models through Verbose Samples
High energy consumption and latency time can be induced in multi-modal large language models (MLLMs) by crafting verbose samples, including verbose images and verbose videos, which maximize the length of generated sequences.