Centrala begrepp
AutoGNNUQ, an automated uncertainty quantification approach, leverages neural architecture search to generate an ensemble of high-performing graph neural networks, enabling accurate estimation of both aleatoric and epistemic uncertainties in molecular property predictions.
Sammanfattning
The content discusses the development of AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction using graph neural networks (GNNs).
Key highlights:
GNNs have emerged as a prominent class of data-driven methods for molecular property prediction, but a key limitation is their inability to quantify predictive uncertainties.
AutoGNNUQ employs neural architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of both aleatoric (data) and epistemic (model) uncertainties.
The approach decomposes the total uncertainty into aleatoric and epistemic components, providing valuable insights for reducing different sources of uncertainty.
Computational experiments demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of prediction accuracy and UQ performance on multiple benchmark datasets.
t-SNE visualization is used to explore correlations between molecular features and uncertainty, offering insights for dataset improvement.
AutoGNNUQ has broad applicability in domains like drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
Statistik
Lipo dataset: 0.64 ± 0.02 RMSE for octanol-water partition coefficient prediction
ESOL dataset: 0.74 ± 0.06 RMSE for water solubility prediction
FreeSolv dataset: 1.32 ± 0.29 RMSE for hydration free energy prediction
QM7 dataset: 47.5 ± 2.1 MAE for atomization energy prediction
Citat
"AutoGNNUQ surpasses the benchmark MPNN ensemble on most datasets, shown by mean MCA values of 0.052, 0.052, and 0.15 for Lipo, ESOL, and FreeSolv, respectively. This equates to an 86%, 86%, and 55% reduction in comparison to the benchmark results."
"For Lipo, ESOL, FreeSolv, and QM7, the majority of observed errors fall within one std., with percentages of 75.9 ± 1.2%, 75.8 ± 3.2%, 85.5 ± 4.6%, and 90.9 ± 1.0%, respectively, across eight random seeds."