LLM-Pilot: A System for Characterizing and Optimizing the Performance of Large Language Model Inference Services
LLM-Pilot is a novel system that characterizes and predicts the performance of LLM inference services across various GPUs, enabling cost-effective deployment while meeting performance requirements.