dc.description.abstract | In this dissertation, first, a flexible model is introduced using a mixture of the
Negative Binomial (NB) distribution and a random distribution characterized by Dirichlet
process (DP) (referred to as NB-DP). This modeling approach aims to provide a greater
flexibility to the NB distribution in order to overcome different limitations of the NB
distribution, such as modeling data with many zero observations and a long (or heavy)
tail. Application of the NB-DP to two observed datasets indicated that the NB-DP model
offers a better performance than the NB when data are characterized by many zero
observations and a long tail. In addition to a greater flexibility, the NB-DP provides a
clustering by-product that allows the safety analyst to better understand the characteristics
of the data or domain.
Second, a methodology is proposed to select the most-likely-true sampling
distribution between potential alternatives, based on the characteristic of the data, before
fitting the models. The proposed methodology employs two analytic tools: (1) Monte
Carlo Simulations and (2) Machine Learning Classifiers, to design simple heuristics to
predict the label of the most-likely-true distribution for analyzing data. Next, this method
was first applied to investigate when the Poisson-lognormal is preferred over the NB. The
results showed that the kurtosis, skewness and percentage of zeros are the main summary
statistics needed to select a distribution between these two alternatives. Then, it was
investigated when the Negative Binomial Lindley (NB-L) is preferred over the NB. The
results showed that the skewness, coefficient of variation, kurtosis, variance-to-mean ratio,
and the percentage of zeros are among the most important summary statistics (or
predictors) required to select a logical distribution between the NB and NB-L. | en |