핵심 개념
This study investigates the feasibility of optimizing site-specific temperature and dew point forecasts by adopting the gradient boosting decision tree model XGBoost, supported by insights from Shapley Additive Explanations (SHAP) to increase the reliability of the machine learning-based forecasts.
초록
The study focuses on developing a framework called Multi-SiteBoost (MSB) to generate site-specific deterministic temperature and humidity forecasts using machine learning. Key highlights:
The gridded numerical weather prediction (NWP) data from the IMPROVER system is used as input, along with site observation data, to train XGBoost regression models for each site and forecast variable.
Various data pre-processing techniques are applied, including the inclusion of surrounding grid values, data selection, automatic feature selection, and data scaling, to optimize the model performance.
The trained XGBoost models show significant improvement in forecast accuracy compared to the original IMPROVER grid values, with average reduction in hourly RMSE by 11.35% for temperature and 12.28% for dew point.
SHAP analysis is used to explain the model outputs, revealing that the predictions are primarily driven by a linear combination of the main variable at different grid locations, with supplementary nonlinear effects from other features like hour, wind, and dew point.
To increase the reliability of the machine learning-based forecasts, the study explores methods to identify potentially unreliable individual predictions, such as those with out-of-bound feature values. These unreliable predictions are found to have much higher error than the overall average.
통계
The RMSE of hourly temperature forecasts from the XGBoost models is reduced by up to 1.05°C compared to the IMPROVER grid values across the selected sites.
The percentage reduction in hourly RMSE ranges from 0.54% to 37.13% for temperature, and 0.54% to 37.13% for dew point, depending on the site.
On average, the XGBoost models reduce the hourly RMSE by 11.35% for temperature and 12.28% for dew point, compared to IMPROVER.
The percentage of critical errors (absolute error > 2°C) is reduced by 5.60% for temperature and 6.19% for dew point on average.
인용구
"The improvement from XGBoost is found to be comparable with non-ML methods reported in literature."
"The insights provided by SHAP reveal that the model outputs can be approximately described as a combination of a main component from linear combinations of selected NWP predictions and some supplementary nonlinear components."