Topics in multiple hypotheses testing

Qian, Yi

dc.contributor.advisor	Hart, Jeffrey D.
dc.creator	Qian, Yi
dc.date.accessioned	2007-04-25T20:06:09Z
dc.date.available	2007-04-25T20:06:09Z
dc.date.created	2005-12
dc.date.issued	2007-04-25
dc.identifier.uri	https://hdl.handle.net/1969.1/4754
dc.description.abstract	It is common to test many hypotheses simultaneously in the application of statistics. The probability of making a false discovery grows with the number of statistical tests performed. When all the null hypotheses are true, and the test statistics are indepen- dent and continuous, the error rates from the family wise error rate (FWER)- and the false discovery rate (FDR)-controlling procedures are equal to the nominal level. When some of the null hypotheses are not true, both procedures are conservative. In the first part of this study, we review the background of the problem and propose methods to estimate the number of true null hypotheses. The estimates can be used in FWER- and FDR-controlling procedures with a consequent increase in power. We conduct simulation studies and apply the estimation methods to data sets with bio- logical or clinical significance. In the second part of the study, we propose a mixture model approach for the analysis of ChIP-chip high density oligonucleotide array data to study the interac- tions between proteins and DNA. If we could identify the specific locations where proteins interact with DNA, we could increase our understanding of many important cellular events. Most experiments to date are performed in culture on cell lines, bac- teria, or yeast, and future experiments will include those in developing tissues, organs, or cancer biopsies, and they are critical in understanding the function of genes and proteins. Here we investigate the ChIP-chip data structure and use a beta-mixture model to help identify the binding sites. To determine the appropriate number of components in the mixture model, we suggest the Anderson-Darling testing. Our study indicates that it is a reasonable means of choosing the number of components in a beta-mixture model. The mixture model procedure has broad applications in biology and is illustrated with several data sets from bioinformatics experiments.	en
dc.format.extent	564568 bytes	en
dc.format.medium	electronic	en
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.publisher	Texas A&M University
dc.subject	false discovery rate	en
dc.subject	true null hypotheses	en
dc.subject	ChIP-chip	en
dc.subject	beta-mixture	en
dc.title	Topics in multiple hypotheses testing	en
dc.type	Book	en
dc.type	Thesis	en
thesis.degree.department	Statistics	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Dahm, P. Fred
dc.contributor.committeeMember	Siegele, Deborah A.
dc.contributor.committeeMember	Wehrly, Thomas E.
dc.type.genre	Electronic Dissertation	en
dc.type.material	text	en
dc.format.digitalOrigin	born digital	en

Files in this item

Name:: etd-tamu-2005C-STAT-Qian.pdf
Size:: 551.3Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record