Core Concepts
DimVis employs an Explainable Boosting Machine (EBM) model to provide interpretations of visual clusters in dimensionality reduction projections, enabling users to gain insights into the underlying factors that influence cluster formation.
Abstract
The paper presents DimVis, a visualization tool that uses a supervised Explainable Boosting Machine (EBM) model to interpret dimensionality reduction (DR) projections, such as those generated by UMAP. DimVis allows users to interactively explore visual patterns (clusters, shapes, etc.) in DR layouts and gain insights into the factors that influence the formation of these patterns.
The key components of DimVis include:
A UMAP projection of the dataset, with the ability to adjust UMAP hyperparameters.
A panel for selecting a dataset and adjusting UMAP parameters.
Performance metrics related to the underlying EBM model that supports the visual exploration.
A ranking of single and pairwise features that contribute to the separation between the user-selected data points and the rest of the dataset, based on the EBM model's feature importance.
Visualizations, such as line plots and histograms, that allow users to explore the impact of individual features or feature pairs on the cluster formation.
The paper demonstrates the applicability and effectiveness of DimVis through a use case with the Breast Cancer Wisconsin (Original) dataset and a usage scenario in the healthcare domain using the Pima Indian diabetes dataset. The results show that DimVis can provide valuable insights into the underlying factors that influence the formation of visual clusters in DR projections, even in the absence of ground truth labels.
The paper also discusses the design choices, limitations, and potential future directions for DimVis, such as objective comparisons to other similar tools, refinements to the user experience based on expert feedback, and improvements to the computational efficiency of the underlying algorithms.
Stats
The number of bare nuclei is the most important single feature for the formation of cluster C1.
The clump thickness is the most important single feature for the formation of cluster C2.
The combination of uniformity of cell size and bland chromatin is the most important feature pair for the formation of cluster C3.
Quotes
"DimVis uses the state-of-the-art, supervised 'glass-box' EBM model to interpret visualizations generated with unsupervised DR techniques."
"DIMVIS utilizes the UMAP algorithm and users can interactively adjust UMAP's hyperparameters – 'Number of Neighbors' and 'Minimum Distance' – to explore different projections."
"When a user clicks on a single feature in the bar chart, a line plot and a histogram appear, displaying the impact of that specific feature on the model's predictions."