Sponge: Efficient Inference Serving with Dynamic SLO Guarantees Using In-Place Vertical Scaling
Sponge maximizes resource efficiency while guaranteeing dynamic SLOs for deep learning inference serving by applying in-place vertical scaling, dynamic batching, and request reordering.