Verbose images can induce high energy-latency cost in VLMs by increasing the length of generated sequences.