This survey explores generalization bounds for learning from graph-dependent data, where the dependencies among examples are described by a dependency graph. It begins by introducing various graph-theoretic concepts and concentration inequalities for functions of graph-dependent random variables.
The key highlights and insights are:
Dependency graphs provide a natural way to model the dependencies among data points, which is more practical than relying on quantitative mixing coefficients or vanishing moments.
Concentration inequalities, such as Janson's inequality and McDiarmid-type bounds, are derived for functions of graph-dependent variables by decomposing the variables into independent sets based on fractional colorings of the dependency graph.
These concentration results are then used to establish generalization bounds for learning from graph-dependent data using two approaches:
a. Fractional Rademacher complexity: The fractional Rademacher complexity is defined by decomposing the empirical Rademacher complexity into sums of independent variables. This leads to generalization bounds that depend on the fractional chromatic number of the dependency graph.
b. Algorithmic stability: Concentration inequalities for Lipschitz functions of graph-dependent variables are used to derive stability-based generalization bounds that are tailored to specific learning algorithms.
The presented framework is illustrated through practical learning tasks such as learning-to-rank, multi-class classification, and learning from m-dependent data, demonstrating the applicability of the results.
The survey concludes by discussing perspectives and future research directions in this area of learning with interdependent data.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы