The study proposes a novel approach to construct feature graphs from the structure of unsupervised random forests. The feature graphs are built such that the centrality of features captures their relevance to the clustering task, while the edge weights reflect the discriminating power of feature pairs.
The authors introduce two feature selection strategies - a brute-force method and a greedy approach - to identify the top k features from the constructed feature graphs. These strategies prioritize features connected by heavy edges, as the edge weight is shown to correlate with the ability of the feature pair to separate clusters.
The effectiveness of the proposed graph-building and graph-mining methods is extensively evaluated on synthetic and benchmark datasets. The results demonstrate that the feature centrality accurately captures feature relevance, and the edge weights reliably indicate the discriminatory power of feature pairs. The feature selection strategies consistently identify all relevant features before any irrelevant ones, and the optimal number of features can be inferred from the average weight of the selected subgraph.
The authors also present a cluster-specific feature graph construction approach, which can effectively distinguish cluster-specific, sub-relevant, and irrelevant features. Finally, the proposed methods are applied to a real-world biomedical application of disease subtyping, showcasing their potential to enhance interpretability in clustering analyses.
To Another Language
from source content
arxiv.org
Дополнительные вопросы