Core Concepts

Likelihood-based deep generative models can assign higher likelihoods to out-of-distribution (OOD) data from simpler sources, despite never generating such OOD samples. This paradox can be explained by the fact that these models assign high density but low probability mass to regions containing the OOD data, which have lower intrinsic dimension than the in-distribution data.

Abstract

The paper explores the paradox where likelihood-based deep generative models (DGMs) such as normalizing flows (NFs) and diffusion models (DMs) assign higher likelihoods to out-of-distribution (OOD) data from simpler datasets, despite never generating such OOD samples.
The authors propose a geometric explanation for this paradox. They argue that DGMs can assign high densities to OOD data that lies on low-dimensional manifolds, while still assigning negligible probability mass to these regions. This is possible because the probability mass assigned by the model is related to both the density and the intrinsic dimension (LID) of the region.
Specifically, the authors show that when OOD data has lower intrinsic dimension than the in-distribution data, the DGM can assign high densities to the OOD region while still assigning low probability mass, since probability mass is proportional to both density and volume (which increases with LID). This allows the model to closely approximate the true data distribution while also exhibiting the paradoxical behavior of assigning high likelihoods to OOD data.
The authors propose a dual threshold OOD detection method that leverages both the likelihood and the LID estimated from the pre-trained DGM. They demonstrate that this method outperforms using likelihoods alone, as well as several other baselines, on a range of OOD detection tasks for both NFs and DMs.

Stats

The intrinsic dimension (LID) of a region is related to the probability mass assigned by the model to that region: probability mass ∝ density × volume, and volume increases with LID.
When OOD data has lower intrinsic dimension than in-distribution data, the model can assign high densities to the OOD region while still assigning low probability mass.

Quotes

"OOD datapoints can be assigned higher likelihoods while not being generated if they belong to regions of low probability mass."
"A large LIDθ(x) is equivalent to a rapid growth of the log probability mass that pθ assigns to a neighbourhood of x as the size of the neighbourhood increases."

Key Insights Distilled From

by Hamidreza Ka... at **arxiv.org** 03-29-2024

Deeper Inquiries

The insights from this work can be extended to other types of deep generative models, such as variational autoencoders (VAEs), by incorporating intrinsic dimension estimation into the OOD detection process. For VAEs, one could potentially estimate the intrinsic dimension of the latent space or the reconstruction space to identify regions where the model assigns high likelihoods but low probability mass. By leveraging LID estimates in conjunction with likelihoods, similar to the approach proposed in this work for normalizing flows and diffusion models, one could improve OOD detection performance for VAEs as well. This extension would involve adapting the LID estimation technique to the specific architecture and characteristics of VAEs, ensuring that the method is compatible with the model's structure.

The implications of this work for the design of deep generative models that are more robust to out-of-distribution data are significant. By understanding the relationship between intrinsic dimension and probability mass, model designers can incorporate mechanisms to better capture the underlying structure of the data distribution. This could involve incorporating regularization techniques during training that encourage the model to focus on low-dimensional manifolds where the data resides. Additionally, by considering the volume of probability mass assigned by the model around data points, designers can develop models that not only generate high-quality samples but also exhibit improved generalization to out-of-distribution data. Overall, the insights from this work can guide the development of more robust and reliable deep generative models that are better equipped to handle OOD scenarios.

The connection between intrinsic dimension and probability mass can indeed be leveraged to improve the sample quality of deep generative models beyond just out-of-distribution detection. By focusing on regions of high density and non-negligible probability mass, models can generate samples that are more representative of the true data distribution. This approach can lead to the generation of more diverse and realistic samples, as the model learns to capture the essential characteristics of the data distribution while avoiding spurious high-density regions with low probability mass. By incorporating intrinsic dimension estimation into the training process, deep generative models can potentially learn to generate samples that are not only visually appealing but also more coherent and meaningful in terms of the underlying data structure.

0