Core Concepts
The core message of this paper is to develop a general framework for regression on extremes, where the goal is to build predictive functions that perform well in regions of the input space with unusually large values. Under appropriate regular variation assumptions regarding the joint distribution of the input and output variables, the authors show that an asymptotic notion of risk can be tailored to summarize predictive performance in extreme regions, and that minimization of an empirical version of this 'extreme risk' yields good generalization capacity.
Abstract
The paper proposes a framework for regression on extremes, where the goal is to build predictive functions that perform well in regions of the input space with unusually large values. The key ideas are:
Assumptions: The authors introduce a regular variation framework, where the joint distribution of the input and output variables satisfies certain tail behavior assumptions. This allows them to define an asymptotic notion of risk that captures predictive performance in extreme regions.
Algorithm: The authors propose an algorithm that minimizes an empirical version of the 'extreme risk', based on a fraction of the largest observations in the training data. This empirical risk minimization approach is shown to have good generalization guarantees.
Theoretical Analysis: The authors establish several key theoretical results:
They show that prediction functions that depend only on the angular component of the input (i.e., the direction but not the magnitude) can optimally minimize the asymptotic extreme risk.
They provide non-asymptotic bounds on the excess of asymptotic risk for the empirical risk minimizer, demonstrating its near-optimality.
Numerical Experiments: The authors provide numerical results on both simulated and real data, supporting the relevance of their approach for regression problems with heavy-tailed inputs.
The paper provides a comprehensive framework for addressing regression problems where accurate prediction in extreme regions of the input space is crucial, with strong theoretical and empirical justification.