FlexLLM: Co-Serving Large Language Model Inference and Finetuning System
FlexLLM introduces a novel system that co-serves large language model inference and parameter-efficient finetuning requests, optimizing GPU resource utilization and reducing memory overhead.