Logo image
Himawari-8-derived diurnal variations in ground-level PM2.5 pollution across China using the fast space-time Light Gradient Boosting Machine (LightGBM)
Journal article   Open access   Peer reviewed

Himawari-8-derived diurnal variations in ground-level PM2.5 pollution across China using the fast space-time Light Gradient Boosting Machine (LightGBM)

Jing Wei, Z. Li, Rachel T Pinker, Jun Wang, Lin Sun, Wenhao Xue, Runze Li and Maureen Cribb
Atmospheric chemistry and physics, Vol.21(10), pp.7863-7880
05/01/2021
DOI: 10.5194/acp-21-7863-2021
url
https://doi.org/10.5194/acp-21-7863-2021View
Published (Version of record) Open Access

Abstract

Fine particulate matter with a diameter of less than 2.5  µm ( PM2.5 ) has been used as an important atmospheric environmental parameter mainly because of its impact on human health. PM2.5 is affected by both natural and anthropogenic factors that usually have strong diurnal variations. Such information helps toward understanding the causes of air pollution, as well as our adaptation to it. Most existing PM2.5 products have been derived from polar-orbiting satellites. This study exploits the use of the next-generation geostationary meteorological satellite Himawari-8/AHI (Advanced Himawari Imager) to document the diurnal variation in PM2.5 . Given the huge volume of satellite data, based on the idea of gradient boosting, a highly efficient tree-based Light Gradient Boosting Machine (LightGBM) method by involving the spatiotemporal characteristics of air pollution, namely the space-time LightGBM (STLG) model, is developed. An hourly PM2.5 dataset for China (i.e., ChinaHigh PM2.5 ) at a 5  km spatial resolution is derived based on Himawari-8/AHI aerosol products with additional environmental variables. Hourly PM2.5 estimates (number of data samples  =  1 415 188) are well correlated with ground measurements in China (cross-validation coefficient of determination, CV- R2   =  0.85), with a root-mean-square error (RMSE) and mean absolute error (MAE) of 13.62 and 8.49  µg m−3 , respectively. Our model captures well the PM2.5 diurnal variations showing that pollution increases gradually in the morning, reaching a peak at about 10:00 LT (GMT + 8), then decreases steadily until sunset. The proposed approach outperforms most traditional statistical regression and tree-based machine-learning models with a much lower computational burden in terms of speed and memory, making it most suitable for routine pollution monitoring.

Details

Metrics

Logo image