There is a necessary tradeoff between the speed and accuracy of compression algorithms for clustering big data. While fast, sublinear-time sampling methods can be sufficient for many practical datasets, optimal strong coresets are necessary to ensure robust compression guarantees across a wide range of data distributions.
The authors present a combinatorial algorithm that simultaneously approximates all ℓp-norms in correlation clustering, providing the first proof of minimal sacrifice needed to optimize different norms of the disagreement vector.