Testing Statistical Hypotheses for Latent Variable Models and Some Computational Issues

Lee, Donghyuk

dc.contributor.advisor	Sinha, Samrian
dc.creator	Lee, Donghyuk
dc.date.accessioned	2019-01-18T14:48:32Z
dc.date.available	2020-08-01T06:39:10Z
dc.date.created	2018-08
dc.date.issued	2018-07-27
dc.date.submitted	August 2018
dc.identifier.uri	https://hdl.handle.net/1969.1/173973
dc.description.abstract	In this dissertation, I address unorthodox statistical problems concerning goodness-of-fit tests in the latent variable context and efficient statistical computations. In epidemiological and biomedical studies observations with measurement errors are quite common, especially when it is difficult to calibrate true signals accurately. In this first problem, I develop a statistical test for testing equality of two distributions when the observed contaminated data follow the classical additive measurement error model. The fact is that the two-sample homogeneity tests, such as Kolmogorov-Smirnov, Anderson-Darling, or von Mises test, are not consistent when observations are subject to measurement error. To develop a consistent test, first the characteristic functions of unobservable true random variables are estimated from the contaminated data, and then the test statistic is defined as the integrated difference between the two estimated characteristic functions. It is shown that when the sample size is large and the null hypothesis holds, the test statistic converges to an integral of a squared Gaussian process. However, enumeration of this distribution to obtain the rejection region is not simple. Therefore, I propose a bootstrap approach to compute the p-value of the test statistic. The operating characteristics of the proposed test is assessed and compared with the other approaches via extensive simulation studies. The proposed method is then applied to analyze the National Health and Nutrition Examination Survey (NHANES) dataset. Although researchers considered estimation of the regression parameters in the presence of exposure measurement error, this testing problem is completely new and no one has considered it before. In the next problem, I consider the stochastic frontier model (SFM) which is a widely used model for measuring firms’ efficiency. In productivity or cost studies in the field of econometrics, there is a discrepancy between the theoretically optimal product and the actual output for a certain amount of inputs and this gap is called technical inefficiency. To assess this inefficiency, the stochastic frontier model is in use to include this gap as a latent variable in addition to the usual statistical noise. Since it is unable to observe this gap, estimation and inference depend on the distributional assumption of the technical inefficiency term. Usually, an exponential or half-normal distribution is widely assumed for the inefficiency term. In that sense, I develop a Bayesian test for testing whether this parametric assumption is correct. I construct a broad semiparametric family which approximate or contain the true distribution as an alternative and then define a Bayes factor. I show the Bayes factor consistency under certain conditions and present the finite sample performance via Monte-Carlo simulations. The second part of my dissertation is about statistical computational problems. Frequentist standard errors are of interest to evaluate uncertainty of an estimator and utilized for many statistical inference problems. In this dissertation, I consider standard error calculation for Bayes estimators. Except some hypothetical scenarios, estimating frequentist variability of any estimator possibly involves bootstrapping to approximate the sampling distribution of the estimator. In addition, for a Bayesian modeling combined with Markov chain Monte Carlo (MCMC) and bootstrap the computation of the standard error of Bayes estimator is computationally expensive and impractical. Specifically, repeated application of the MCMC on each of the bootstrapped data make everything computationally inefficient. To overcome this difficulty, I propose a clever use of the importance sampling technique to reduce the computational burden. I apply this proposed technique to several examples including logistic regression, linear measurement error model, Weibull regression model and vector autoregressive model. In the second computational problem, I explore the binary regression with flexible skew-probit link function which contains traditional probit link function as a special case. The skew-probit model is useful for modelling success probability of binary response or count data where the success probability is not a symmetric function of continuous regressors. In this topic, I investigate the parameter identifiability of skew-probit model. I then demonstrate that the maximum likelihood estimator (MLE) of the skewness parameter is highly biased. I develop a penalized likelihood approach based on three penalty functions to reduce the finite sample bias of the MLE of the skew-probit model. The performances of each penalized MLE are compared through extensive simulations and I analyze the heart-disease data using the proposed approaches.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Bootstrap	en
dc.subject	Goodness-of-fit	en
dc.subject	Standard error	en
dc.subject	Skew-probit link	en
dc.subject	Technical inefficiency	en
dc.subject	Two sample test	en
dc.subject		en
dc.title	Testing Statistical Hypotheses for Latent Variable Models and Some Computational Issues	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Carroll, Raymond J.
dc.contributor.committeeMember	Zhou, Lan
dc.contributor.committeeMember	Zoh, Roger S.
dc.type.material	text	en
dc.date.updated	2019-01-18T14:48:32Z
local.embargo.terms	2020-08-01
local.etdauthor.orcid	0000-0001-5471-5472

Files in this item

Name:: LEE-DISSERTATION-2018.pdf
Size:: 1.482Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record