Core Concepts
Sharp, nonasymptotic bounds are obtained for the relative entropy between the distributions of sampling with and without replacement from an urn with balls of c ≥ 2 colors. The bounds depend on the number of balls of each color in the urn.
Abstract
The content presents a study of the relationship between sampling with and without replacement from an urn containing n balls of c ≥ 2 different colors. The authors derive sharp, nonasymptotic bounds on the relative entropy (Kullback-Leibler divergence) between the distributions of sampling with and without replacement.
Key highlights:
The relative entropy D(n, k, ℓ) is bounded above by an expression that depends on the number of balls of each color ℓ = (ℓ1, ℓ2, ..., ℓc) in the urn.
The first term of the bound matches the asymptotic expression obtained in prior work, showing that the new bound is a nonasymptotic version of the earlier result.
The dependence on ℓ is via two quantities Σ1(n, c, ℓ) and Σ2(n, c, ℓ), which are bounded in 'balanced' cases but can grow large in 'unbalanced' cases.
An alternative bound is provided for the c = 2 case, which outperforms the earlier uniform bounds across a wide range of parameter values.
The connection between finite de Finetti theorems and sampling bounds is explored, leading to a sharp finite de Finetti bound.
Stats
The following sentences contain key metrics or figures:
The relative entropy D(n, k, ℓ) between H and B satisfies:
D(n, k, ℓ) ≤ (c-1)/2 * log(n/(n-k)) - k/(n-1) + k(2n+1)/(12n(n-1)(n-k)) * Σ1(n, c, ℓ) + 1/360 * (1/(n-k)^3 - 1/n^3) * Σ2(n, c, ℓ).
For c = 2 and ℓ ≤ n/2, D(n, k, (ℓ, n-ℓ)) ≤ ℓ * [1 - k/n * log(1-k/n) + k/n - k/(2n(n-1))] + kℓ/(n-1)(n-ℓ)(n-k) - k(k-1)/(2n(n-1)ℓ(ℓ-1)) * log(ℓ/(ℓ-1)) - k(k-1)(k-2)/(6n(n-1)(n-2)ℓ(ℓ-1)(ℓ-2)) * log((ℓ-1)^2/(ℓ(ℓ-2))).