Show simple item record

dc.contributor.advisorXu, Yangyang
dc.contributor.advisorYang, Ping
dc.creatorShores, Kyle Ray
dc.date.accessioned2023-09-18T16:18:51Z
dc.date.created2022-12
dc.date.issued2022-12-05
dc.date.submittedDecember 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/198515
dc.description.abstractPM2.5 concentrations are monitored at ground stations and are often geographically sparse. Machine learning (ML) methods, and more conventional statistical models, have been used to estimate PM2.5 from aerosol optical depth (AOD), meteorological factors, and other miscellaneous land data. Despite the demonstrated success of ML approaches in many regional studies, which set of ML methods and corresponding explanatory variables are best suited for this purpose, remains to be systematically evaluated. We estimate daily surface PM2.5 concentration over Texas from data in NASA’s MERRA-2 reanalysis product using three categories of machine learning models (linear models, neural networks, and ensemble tree models). In addition, a stacked model using the three best models is evaluated. We find that neural networks and a Random Forest perform better than the other models. The training time of each model is evaluated and we find linear models to be fastest, followed by trees and then neural networks. Following the model comparison over Texas with a synthetic dataset, we choose a subset of better performing models and apply them to a real-world dataset (May 2017–December 2021) composed of daily averaged AOD values from GOES-16 and meteorology reanalysis from ERA5 to estimate daily PM2.5 concentration over the United States. We find that the Extra Trees is the best performing model with an out-of-sample R2 of 0.90, and a mean absolute error (MAE) and root mean squared error (RMSE) of 0.73 and 1.63 µg/m3. In good agreement with previous studies, it is found that the boundary layer height, air temperature and dewpoint temperature, and wind components are most important for strengthening ground level PM2.5 predictions from satellite AOD. The daily PM2.5 concentration during 2021 show that most locations in the US are below the recommended annual limit by the EPA, while a few larger cities meet or exceed this value (e.g., near LA, Houston, and Pittsburgh). It also reveals temporal outbreak events for rural locations due to wildfire events on the west coast. The complete PM2.5 estimation will be released for public use once the entire data record over the five-year period and the entire North America is completed.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectPM2.5
dc.subjectMachine Learning
dc.titleMachine Learning Based Estimation of Daily Surface PM2.5 over the US Using Geostationary Remote Sensing, Ground Measurements and Meteorological Reanalysis Datasets
dc.typeThesis
thesis.degree.departmentAtmospheric Sciences
thesis.degree.disciplineAtmospheric Sciences
thesis.degree.grantorTexas A&M University
thesis.degree.nameMaster of Science
thesis.degree.levelMasters
dc.contributor.committeeMemberChoe, Yoonsuck
dc.type.materialtext
dc.date.updated2023-09-18T16:18:52Z
local.embargo.terms2024-12-01
local.embargo.lift2024-12-01
local.etdauthor.orcid0000-0002-4272-5187


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record