Show simple item record

dc.contributor.advisorLord, Dominique
dc.creatorShirazi, Mohammadali
dc.date.accessioned2019-01-23T19:30:10Z
dc.date.available2020-12-01T07:31:49Z
dc.date.created2018-12
dc.date.issued2018-08-28
dc.date.submittedDecember 2018
dc.identifier.urihttps://hdl.handle.net/1969.1/174406
dc.description.abstractIn this dissertation, first, a flexible model is introduced using a mixture of the Negative Binomial (NB) distribution and a random distribution characterized by Dirichlet process (DP) (referred to as NB-DP). This modeling approach aims to provide a greater flexibility to the NB distribution in order to overcome different limitations of the NB distribution, such as modeling data with many zero observations and a long (or heavy) tail. Application of the NB-DP to two observed datasets indicated that the NB-DP model offers a better performance than the NB when data are characterized by many zero observations and a long tail. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data or domain. Second, a methodology is proposed to select the most-likely-true sampling distribution between potential alternatives, based on the characteristic of the data, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte Carlo Simulations and (2) Machine Learning Classifiers, to design simple heuristics to predict the label of the most-likely-true distribution for analyzing data. Next, this method was first applied to investigate when the Poisson-lognormal is preferred over the NB. The results showed that the kurtosis, skewness and percentage of zeros are the main summary statistics needed to select a distribution between these two alternatives. Then, it was investigated when the Negative Binomial Lindley (NB-L) is preferred over the NB. The results showed that the skewness, coefficient of variation, kurtosis, variance-to-mean ratio, and the percentage of zeros are among the most important summary statistics (or predictors) required to select a logical distribution between the NB and NB-L.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectDirichlet processen
dc.subjectModel Selectionen
dc.subjectNegative Binomialen
dc.subjectGeneralized Linear Modelen
dc.subjectCrash Dataen
dc.titleAdvanced Statistical Methods for Analyzing Crash Datasets with Many Zero Observations and a Long Tail: Semiparametric Negative Binomial Dirichlet Process Mixture and Model Selection Heuristicsen
dc.typeThesisen
thesis.degree.departmentCivil Engineeringen
thesis.degree.disciplineCivil Engineeringen
thesis.degree.grantorTexas A & M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberHart, Jeffrey
dc.contributor.committeeMemberQuadrifoglio, Luca
dc.contributor.committeeMemberZhang, Yunlong
dc.type.materialtexten
dc.date.updated2019-01-23T19:30:11Z
local.embargo.terms2020-12-01
local.etdauthor.orcid0000-0001-8859-0794


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record