The author proposes LLM-PQ, advocating adaptive model quantization and phase-aware partition to improve LLM serving efficiency on heterogeneous GPU clusters.