Logo image
Machine Learning-Based Estimation of Surface NO2 Concentrations over China: A Comparative Analysis of Geostationary (GEMS) and Polar-Orbiting (TROPOMI) Satellite Data
Journal article   Open access   Peer reviewed

Machine Learning-Based Estimation of Surface NO2 Concentrations over China: A Comparative Analysis of Geostationary (GEMS) and Polar-Orbiting (TROPOMI) Satellite Data

Ma Yijin, Yi Wang, Jun Wang, Tao Minghui, Jhoon Kim, Chenyang Wu and Shanshan Zhang
Remote sensing (Basel, Switzerland), Vol.18(4), 614
02/15/2026
DOI: 10.3390/rs18040614
url
https://doi.org/10.3390/rs18040614View
Published (Version of record) Open Access

Abstract

What are the main findings? The CatBoost model performed best, with GEMS data yielding higher accuracy (R2 = 0.842) than TROPOMI data (R2 = 0.765). GEMS’s high temporal resolution provided a much larger training dataset, which was the key factor for its superior model performance. What are the implications of the main findings? Geostationary satellite data (like GEMS) offers a critical advantage for high-resolution air quality monitoring via machine learning due to its frequent sampling. GEMS enables the reconstruction of detailed diurnal pollution patterns and near-real-time tracking of emission events, providing valuable insights for dynamic air quality management. High-accuracy spatiotemporal monitoring of surface nitrogen dioxide (NO2) concentrations is essential for air quality management. This study evaluates machine learning-based estimates of near-surface NO2 concentrations using data from the geostationary GEMS instrument and the polar-orbiting TROPOMI over China in 2022. Four tree-based models—Random Forest, XGBoost, CatBoost, and LightGBM—were trained by integrating satellite vertical-column densities with multi-source meteorological and ancillary data. Results show that CatBoost achieved the highest accuracy, with an R2 of 0.842 for GEMS and 0.765 for TROPOMI, alongside the lowest RMSE and MAE. Models trained on GEMS data consistently outperformed TROPOMI-based models across all metrics. This advantage is primarily attributed to the substantially larger training sample size enabled by GEMS’s high temporal resolution, as confirmed through a controlled experiment with consistent sample sizes which isolated the effect of data volume. Spatially, GEMS estimates captured sharper concentration gradients and localized emission hotspots, while TROPOMI produced smoother fields. Temporally, only GEMS allowed the reconstruction of detailed diurnal patterns and near-real-time pollution episode tracking. This study confirms the significant added value of geostationary satellite data for high-frequency air quality monitoring and analysis when combined with machine learning.
Air Quality Machine Learning Remote Sensing Accuracy Air monitoring Artificial intelligence Comparative analysis Concentration gradient Datasets Diurnal Estimates Learning algorithms Models Neural networks Nitrogen dioxide Pollutants Premature mortality Quality management Real time Reconstruction Regression analysis Sensors Statistical analysis Statistical methods Synchronous satellites Temporal resolution Tracking

Details

Metrics

1 Record Views
Logo image