The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.
Machine Learning Based Estimation of Daily Surface PM2.5 over the US Using Geostationary Remote Sensing, Ground Measurements and Meteorological Reanalysis Datasets
dc.contributor.advisor | Xu, Yangyang | |
dc.contributor.advisor | Yang, Ping | |
dc.creator | Shores, Kyle Ray | |
dc.date.accessioned | 2023-09-18T16:18:51Z | |
dc.date.created | 2022-12 | |
dc.date.issued | 2022-12-05 | |
dc.date.submitted | December 2022 | |
dc.identifier.uri | https://hdl.handle.net/1969.1/198515 | |
dc.description.abstract | PM2.5 concentrations are monitored at ground stations and are often geographically sparse. Machine learning (ML) methods, and more conventional statistical models, have been used to estimate PM2.5 from aerosol optical depth (AOD), meteorological factors, and other miscellaneous land data. Despite the demonstrated success of ML approaches in many regional studies, which set of ML methods and corresponding explanatory variables are best suited for this purpose, remains to be systematically evaluated. We estimate daily surface PM2.5 concentration over Texas from data in NASA’s MERRA-2 reanalysis product using three categories of machine learning models (linear models, neural networks, and ensemble tree models). In addition, a stacked model using the three best models is evaluated. We find that neural networks and a Random Forest perform better than the other models. The training time of each model is evaluated and we find linear models to be fastest, followed by trees and then neural networks. Following the model comparison over Texas with a synthetic dataset, we choose a subset of better performing models and apply them to a real-world dataset (May 2017–December 2021) composed of daily averaged AOD values from GOES-16 and meteorology reanalysis from ERA5 to estimate daily PM2.5 concentration over the United States. We find that the Extra Trees is the best performing model with an out-of-sample R2 of 0.90, and a mean absolute error (MAE) and root mean squared error (RMSE) of 0.73 and 1.63 µg/m3. In good agreement with previous studies, it is found that the boundary layer height, air temperature and dewpoint temperature, and wind components are most important for strengthening ground level PM2.5 predictions from satellite AOD. The daily PM2.5 concentration during 2021 show that most locations in the US are below the recommended annual limit by the EPA, while a few larger cities meet or exceed this value (e.g., near LA, Houston, and Pittsburgh). It also reveals temporal outbreak events for rural locations due to wildfire events on the west coast. The complete PM2.5 estimation will be released for public use once the entire data record over the five-year period and the entire North America is completed. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.subject | PM2.5 | |
dc.subject | Machine Learning | |
dc.title | Machine Learning Based Estimation of Daily Surface PM2.5 over the US Using Geostationary Remote Sensing, Ground Measurements and Meteorological Reanalysis Datasets | |
dc.type | Thesis | |
thesis.degree.department | Atmospheric Sciences | |
thesis.degree.discipline | Atmospheric Sciences | |
thesis.degree.grantor | Texas A&M University | |
thesis.degree.name | Master of Science | |
thesis.degree.level | Masters | |
dc.contributor.committeeMember | Choe, Yoonsuck | |
dc.type.material | text | |
dc.date.updated | 2023-09-18T16:18:52Z | |
local.embargo.terms | 2024-12-01 | |
local.embargo.lift | 2024-12-01 | |
local.etdauthor.orcid | 0000-0002-4272-5187 |
Files in this item
This item appears in the following Collection(s)
-
Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )