Advanced Statistical Methods for Analyzing Crash Datasets with Many Zero Observations and a Long Tail: Semiparametric Negative Binomial Dirichlet Process Mixture and Model Selection Heuristics

Shirazi, Mohammadali

dc.contributor.advisor	Lord, Dominique
dc.creator	Shirazi, Mohammadali
dc.date.accessioned	2019-01-23T19:30:10Z
dc.date.available	2020-12-01T07:31:49Z
dc.date.created	2018-12
dc.date.issued	2018-08-28
dc.date.submitted	December 2018
dc.identifier.uri	https://hdl.handle.net/1969.1/174406
dc.description.abstract	In this dissertation, first, a flexible model is introduced using a mixture of the Negative Binomial (NB) distribution and a random distribution characterized by Dirichlet process (DP) (referred to as NB-DP). This modeling approach aims to provide a greater flexibility to the NB distribution in order to overcome different limitations of the NB distribution, such as modeling data with many zero observations and a long (or heavy) tail. Application of the NB-DP to two observed datasets indicated that the NB-DP model offers a better performance than the NB when data are characterized by many zero observations and a long tail. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data or domain. Second, a methodology is proposed to select the most-likely-true sampling distribution between potential alternatives, based on the characteristic of the data, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte Carlo Simulations and (2) Machine Learning Classifiers, to design simple heuristics to predict the label of the most-likely-true distribution for analyzing data. Next, this method was first applied to investigate when the Poisson-lognormal is preferred over the NB. The results showed that the kurtosis, skewness and percentage of zeros are the main summary statistics needed to select a distribution between these two alternatives. Then, it was investigated when the Negative Binomial Lindley (NB-L) is preferred over the NB. The results showed that the skewness, coefficient of variation, kurtosis, variance-to-mean ratio, and the percentage of zeros are among the most important summary statistics (or predictors) required to select a logical distribution between the NB and NB-L.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Dirichlet process	en
dc.subject	Model Selection	en
dc.subject	Negative Binomial	en
dc.subject	Generalized Linear Model	en
dc.subject	Crash Data	en
dc.title	Advanced Statistical Methods for Analyzing Crash Datasets with Many Zero Observations and a Long Tail: Semiparametric Negative Binomial Dirichlet Process Mixture and Model Selection Heuristics	en
dc.type	Thesis	en
thesis.degree.department	Civil Engineering	en
thesis.degree.discipline	Civil Engineering	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Hart, Jeffrey
dc.contributor.committeeMember	Quadrifoglio, Luca
dc.contributor.committeeMember	Zhang, Yunlong
dc.type.material	text	en
dc.date.updated	2019-01-23T19:30:11Z
local.embargo.terms	2020-12-01
local.etdauthor.orcid	0000-0001-8859-0794

Files in this item

Name:: SHIRAZI-DISSERTATION-2018.pdf
Size:: 1.641Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record