Machine Learning Based Estimation of  Daily Surface PM2.5 over the US Using Geostationary Remote Sensing, Ground Measurements and Meteorological Reanalysis Datasets

Shores, Kyle Ray

The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.

Show simple item record

dc.contributor.advisor	Xu, Yangyang
dc.contributor.advisor	Yang, Ping
dc.creator	Shores, Kyle Ray
dc.date.accessioned	2023-09-18T16:18:51Z
dc.date.created	2022-12
dc.date.issued	2022-12-05
dc.date.submitted	December 2022
dc.identifier.uri	https://hdl.handle.net/1969.1/198515
dc.description.abstract	PM2.5 concentrations are monitored at ground stations and are often geographically sparse. Machine learning (ML) methods, and more conventional statistical models, have been used to estimate PM2.5 from aerosol optical depth (AOD), meteorological factors, and other miscellaneous land data. Despite the demonstrated success of ML approaches in many regional studies, which set of ML methods and corresponding explanatory variables are best suited for this purpose, remains to be systematically evaluated. We estimate daily surface PM2.5 concentration over Texas from data in NASA’s MERRA-2 reanalysis product using three categories of machine learning models (linear models, neural networks, and ensemble tree models). In addition, a stacked model using the three best models is evaluated. We find that neural networks and a Random Forest perform better than the other models. The training time of each model is evaluated and we find linear models to be fastest, followed by trees and then neural networks. Following the model comparison over Texas with a synthetic dataset, we choose a subset of better performing models and apply them to a real-world dataset (May 2017–December 2021) composed of daily averaged AOD values from GOES-16 and meteorology reanalysis from ERA5 to estimate daily PM2.5 concentration over the United States. We find that the Extra Trees is the best performing model with an out-of-sample R2 of 0.90, and a mean absolute error (MAE) and root mean squared error (RMSE) of 0.73 and 1.63 µg/m3. In good agreement with previous studies, it is found that the boundary layer height, air temperature and dewpoint temperature, and wind components are most important for strengthening ground level PM2.5 predictions from satellite AOD. The daily PM2.5 concentration during 2021 show that most locations in the US are below the recommended annual limit by the EPA, while a few larger cities meet or exceed this value (e.g., near LA, Houston, and Pittsburgh). It also reveals temporal outbreak events for rural locations due to wildfire events on the west coast. The complete PM2.5 estimation will be released for public use once the entire data record over the five-year period and the entire North America is completed.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	PM2.5
dc.subject	Machine Learning
dc.title	Machine Learning Based Estimation of Daily Surface PM2.5 over the US Using Geostationary Remote Sensing, Ground Measurements and Meteorological Reanalysis Datasets
dc.type	Thesis
thesis.degree.department	Atmospheric Sciences
thesis.degree.discipline	Atmospheric Sciences
thesis.degree.grantor	Texas A&M University
thesis.degree.name	Master of Science
thesis.degree.level	Masters
dc.contributor.committeeMember	Choe, Yoonsuck
dc.type.material	text
dc.date.updated	2023-09-18T16:18:52Z
local.embargo.terms	2024-12-01
local.embargo.lift	2024-12-01
local.etdauthor.orcid	0000-0002-4272-5187

Files in this item

Name:: SHORES-THESIS-2022.pdf
Size:: 9.186Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record