Core Concepts
Compass optimizes job latency and resource utilization in ML workflows through decentralized scheduling and GPU memory management.
Abstract
ML query processing in distributed systems involves GPU-enabled workers.
Compass reduces job latency by efficiently managing GPU memory and task placement.
Comparison with other schedulers shows significant latency reduction and resource efficiency.
Features include decentralized scheduling, GPU cache management, and job/task placement.
Experiment results demonstrate Compass's superior performance in reducing job latency.
Stats
ML 모델은 수백 메가바이트의 크기일 수 있음 [71].
Compass는 작업 지연 시간을 2배에서 6배까지 줄일 수 있음.
Compass는 GPU 캐시 히트율을 99%로 유지함.
Quotes
"Compass plays two roles: platform-level GPU cache management and job/task placement."
"Our decentralized solution has low overheads and outperforms centralized alternatives."