핵심 개념
The proposed IB-MHT methodology provides statistical guarantees on the information bottleneck constraints, while approximately minimizing the information loss, by leveraging multiple hypothesis testing.
초록
The content introduces the information bottleneck (IB) problem, which aims to extract a low-dimensional statistic T from an observation X that retains sufficient information about a correlated variable Y. Conventional approaches to solving the IB problem rely on heuristic tuning of hyperparameters, offering no guarantees that the learned features satisfy the information-theoretic constraints.
The paper proposes a new methodology called IB-MHT (Information Bottleneck via Multiple Hypothesis Testing) that wraps around existing IB solvers to provide statistical guarantees on the IB constraints. The key steps are:
- Estimating an approximate Pareto frontier on the plane (I(T;Y), I(X;T)) using a portion of the available data.
- Sequentially testing candidate hyperparameters in order of decreasing estimated I(T;Y) using a family-wise error rate (FWER) controlling algorithm on a separate portion of the data.
- Selecting the hyperparameter λ* that minimizes the estimated I(X;T) among the hyperparameters that are likely to satisfy the IB constraint.
The proposed IB-MHT approach is demonstrated on both the classical IB problem formulation and the deterministic IB problem. The results show that IB-MHT can satisfy the IB constraint with high probability, while achieving comparable or better performance on the objective I(X;T) compared to conventional IB solvers.
통계
The mutual information I(T;Y) is required to be greater than or equal to a threshold α with high probability.
The mutual information I(X;T) is to be minimized.
인용구
"The information bottleneck (IB) problem is a widely studied framework in machine learning for extracting compressed features that are informative for downstream tasks."
"However, current approaches to solving the IB problem rely on a heuristic tuning of hyperparameters, offering no guarantees that the learned features satisfy information-theoretic constraints."