insight - Machine Learning Theory

### Efficient Learning through Manifold Untangling and Tangling: A Geometric Perspective

Efficient learning can be achieved by a tangling-untangling cycle that maps context-independent representations to context-dependent representations in high-dimensional space, and then collapses the context variables back to the original low-dimensional space for generalization.

### On the Theoretical Limits of Out-of-Distribution Detection

The core message of this paper is to investigate the theoretical limits of learnability for out-of-distribution (OOD) detection under risk and AUC metrics. The authors discover necessary and sufficient conditions for the learnability of OOD detection in several representative domain spaces, revealing the challenges and possibilities of successful OOD detection in practice.

### Computational Limits and Efficient Variants of Modern Hopfield Models

The computational limits of modern Hopfield models are characterized by a norm-based phase transition, where efficient sub-quadratic variants exist only when the norms of input query and memory patterns are below a certain threshold. An efficient nearly linear-time modern Hopfield model is provided as an example, maintaining exponential memory capacity.

### Deriving Tighter PAC-Bayes Bounds with a Novel Divergence Measure

This paper presents a novel high-probability PAC-Bayes bound that achieves a strictly tighter complexity measure than the standard Kullback-Leibler (KL) divergence. The new bound is based on a divergence measure called the Zhang-Cutkosky-Paschalidis (ZCP) divergence, which is shown to be orderwise better than the KL divergence in certain cases.

### Stronger Computational Separations Between Multimodal and Unimodal Machine Learning

There exist average-case computational separations between multimodal and unimodal machine learning tasks, where multimodal learning is feasible in polynomial time but the corresponding unimodal task is computationally hard. However, any such separation implies the existence of cryptographic key agreement protocols, suggesting that very strong computational advantages of multimodal learning may arise infrequently in practice.

### Dichotomy of Early and Late Phase Implicit Biases Provably Induces Grokking in Neural Network Training

The dichotomy of early and late phase implicit biases induced by large initialization and small weight decay can provably lead to a sharp transition from memorization to generalization, a phenomenon known as "grokking", in the training of homogeneous neural networks.

### Multi-Class Classification with Abstention: Theoretical Analysis and Algorithms

The authors present new theoretical and algorithmic results for multi-class classification with abstention in the predictor-rejector framework, including introducing new surrogate losses with strong consistency guarantees.

### Theoretical Analysis of Attention Mechanism via Exchangeability and Latent Variable Models

The attention mechanism can be derived from a latent variable model induced by the exchangeability of input tokens, which enables a rigorous characterization of the representation, inference, and learning aspects of attention.

### Generalization Bounds for Learning from Graph-Dependent Data

This survey explores generalization bounds for learning from graph-dependent data, where the dependencies among examples are described by a dependency graph. It presents concentration inequalities and uses them to derive Rademacher complexity and algorithmic stability generalization bounds for learning from such interdependent data.