The study focuses on developing a framework called Multi-SiteBoost (MSB) to generate site-specific deterministic temperature and humidity forecasts using machine learning. Key highlights:
The gridded numerical weather prediction (NWP) data from the IMPROVER system is used as input, along with site observation data, to train XGBoost regression models for each site and forecast variable.
Various data pre-processing techniques are applied, including the inclusion of surrounding grid values, data selection, automatic feature selection, and data scaling, to optimize the model performance.
The trained XGBoost models show significant improvement in forecast accuracy compared to the original IMPROVER grid values, with average reduction in hourly RMSE by 11.35% for temperature and 12.28% for dew point.
SHAP analysis is used to explain the model outputs, revealing that the predictions are primarily driven by a linear combination of the main variable at different grid locations, with supplementary nonlinear effects from other features like hour, wind, and dew point.
To increase the reliability of the machine learning-based forecasts, the study explores methods to identify potentially unreliable individual predictions, such as those with out-of-bound feature values. These unreliable predictions are found to have much higher error than the overall average.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы