Goodness-of-Fit Test for Large Number of Small Data Sets

Lee, Hyuneui

dc.contributor.advisor	Hart, Jeffrey D.
dc.creator	Lee, Hyuneui
dc.date.accessioned	2018-02-05T16:50:43Z
dc.date.available	2019-08-01T06:51:27Z
dc.date.created	2017-08
dc.date.issued	2017-05-30
dc.date.submitted	August 2017
dc.identifier.uri	https://hdl.handle.net/1969.1/165749
dc.description.abstract	A goodness-of-fit (gof) problem, i.e., testing whether observed data come from a specific distribution is one of the important problems in statistics, and various tests for checking distributional assumptions have been suggested. Most tests are for one data set with a large enough sample sizes. However, this research focuses on the gof problem when there are a large number of small data sets. In other words, we assume that the number of data sets p increases to infinity and the sample size of each small data set n is finite. In this dissertation, we will denote p and n as the number of data sets and the sample sizes of each data sets, respectively. Since the primary interest of this dissertation is testing whether every small data set comes from a known parametric family of distributions with different parameters, it is important to choose a gof test invariant to parameters of unknown distribution. Hence, as a basic approach, we suggest applying empirical distribution function (edf) based gof tests to every small data set and then combining P-values to obtain a single test. Two P-value combining methods, moment based tests and smoothing based tests, are suggested and their pros and cons are discussed. Especially, the two moment based tests, Edgington's method and Fisher's method, are compared with respect to Pitman efficiency and asymptotic power. We also find conditions that guarantee that the asymptotic null distribution of moment based tests based on empirical P-values is the same as that based on exact P-values. When the null is a location and scale family, there is no difficulty in applying the suggested test procedures. However, when the null is not a location and scale family, edf-based tests may depend on unknown parameters. To handle such a problem, we suggest using unconditional P-values and this requires an additional step of estimating the distribution of unknown parameters. Several issues related to estimating the distribution of unknown parameters and obtaining unconditional P-values are also discussed. The performance of suggested test procedures are investigated via simulations and these procedures are applied to microarray data.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Goodness-of-fit test	en
dc.subject	Microarray data	en
dc.subject	Fisher's method	en
dc.subject	Edgington's method	en
dc.subject	Smoothing based tests	en
dc.title	Goodness-of-Fit Test for Large Number of Small Data Sets	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Mueller-Harknett, Uschi
dc.contributor.committeeMember	Sang, Huiyan
dc.contributor.committeeMember	Wu, Ximing
dc.type.material	text	en
dc.date.updated	2018-02-05T16:50:44Z
local.embargo.terms	2019-08-01
local.etdauthor.orcid	0000-0001-8539-1650

Files in this item

Name:: LEE-DISSERTATION-2017.pdf
Size:: 838.9Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record