المفاهيم الأساسية
A novel data-driven method using Random Forest Regression and feature importance can identify and rank the interdependencies between climate variables, enabling the tracing of source-impact pathways from spatio-temporal climate data.
الملخص
The paper presents a novel data-driven methodology for discovering and ranking source-impact pathways in climate data using Random Forest Regression (RFR) and feature importance. The key steps are:
- Train individual RFR models to predict each feature of interest, using all other features as inputs.
- Calculate pairwise feature importances (weights) using SHAP values for each RFR model.
- Translate the feature importances into a weighted pathway network (directed graph), where nodes represent climate variables and edges represent relationships between them.
- Prune weak or irrelevant edges based on statistical criteria.
The method is verified on two test cases:
- A set of synthetic coupled equations with known relationships.
- Simulations of the 1991 Mount Pinatubo volcanic eruption using the E3SMv2-SPA climate model.
For the synthetic case, the method accurately identifies the known variable dependencies. For the Mount Pinatubo case, the method detects the expected stratospheric warming and surface cooling pathways, and also uncovers additional relationships that provide insights into the complex climate system response.
The proposed approach offers a data-driven way to discover and rank climate impact pathways, without requiring prior knowledge of the underlying physical processes. It can be a valuable tool for enhancing our understanding of high-consequence climate system responses.
الإحصائيات
The eruption of Mount Pinatubo in 1991 injected approximately 20 million tons of sulfur dioxide into the stratosphere.
This led to a decrease in surface temperatures by up to 0.5°C by September 1992 due to reduced shortwave radiation.
Stratospheric temperatures increased due to the greenhouse effect of increased sulfates.
اقتباسات
"Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques."
"Our RFR and feature importance approach cannot recover the actual coefficients pre-multiplying each term on the right-hand side of (4); however, the approach is able to detect correctly the relative strength of dependence for each variable and each time lag."
"While the former two 'forward' relationships are expected based on what is known about the surface cooling pathway, it is difficult to corroborate the time lags associated with these variable dependencies."