Show simple item record

dc.contributor.advisorDabney, Alan
dc.creatorWang, Xuan
dc.date.accessioned2012-07-16T15:57:41Z
dc.date.accessioned2012-07-16T20:27:38Z
dc.date.available2014-09-16T07:28:20Z
dc.date.created2012-05
dc.date.issued2012-07-16
dc.date.submittedMay 2012
dc.identifier.urihttps://hdl.handle.net/1969.1/ETD-TAMU-2012-05-10777
dc.description.abstractProteomics serves an important role at the systems-level in understanding of biological functioning. Mass spectrometry proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. In the most widely used bottom-up approach to MS-based high-throughput quantitative proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and then analyzed using a mass spectrometer. The three fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (i) Identifying the proteins that are present in a sample, (ii) Aligning different samples on elution (retention) time, mass, peak area (intensity) and etc, (iii) Quantifying the abundance levels of the identified proteins after alignment. Each of these challenges requires knowledge of the biological and technological context that give rise to the observed data, as well as the application of sound statistical principles for estimation and inference. In this dissertation, we present a set of statistical methods in bottom-up proteomics towards protein identification, alignment and quantification. We describe a fully Bayesian hierarchical modeling approach to peptide and protein identification on the basis of MS/MS fragmentation patterns in a unified framework. Our major contribution is to allow for dependence among the list of top candidate PSMs, which we accomplish with a Bayesian multiple component mixture model incorporating decoy search results and joint estimation of the accuracy of a list of peptide identifications for each MS/MS fragmentation spectrum. We also propose an objective criteria for the evaluation of the False Discovery Rate (FDR) associated with a list of identifications at both peptide level, which results in more accurate FDR estimates than existing methods like PeptideProphet. Several alignment algorithms have been developed using different warping functions. However, all the existing alignment approaches suffer from a useful metric for scoring an alignment between two data sets and hence lack a quantitative score for how good an alignment is. Our alignment approach uses "Anchor points" found to align all the individual scan in the target sample and provides a framework to quantify the alignment, that is, assigning a p-value to a set of aligned LC-MS runs to assess the correctness of alignment. After alignment using our algorithm, the p-values from Wilcoxon signed-rank test on elution (retention) time, M/Z, peak area successfully turn into non-significant values. Quantitative mass spectrometry-based proteomics involves statistical inference on protein abundance, based on the intensities of each protein's associated spectral peaks. However, typical mass spectrometry-based proteomics data sets have substantial proportions of missing observations, due at least in part to censoring of low intensities. This complicates intensity-based differential expression analysis. We outline a statistical method for protein differential expression, based on a simple Binomial likelihood. By modeling peak intensities as binary, in terms of "presence / absence", we enable the selection of proteins not typically amendable to quantitative analysis; e.g., "one-state" proteins that are present in one condition but absent in another. In addition, we present an analysis protocol that combines quantitative and presence / absence analysis of a given data set in a principled way, resulting in a single list of selected proteins with a single associated FDR.en
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectProteomicsen
dc.subjectMass spectrometryen
dc.subjectBottom-upen
dc.subjectIdentificationen
dc.subjectAlignmenten
dc.subjectQuantitationen
dc.subjectLC-MSen
dc.subjectPSMen
dc.subjectBayesian Hierarchical Modelen
dc.subjectFDRen
dc.subjectAnchor Pointsen
dc.subjectMissingnessen
dc.titleStatistical Methods for the Analysis of Mass Spectrometry-based Proteomics Dataen
dc.typeThesisen
thesis.degree.departmentStatisticsen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberLongnecker, Michael
dc.contributor.committeeMemberSang, Huiyan
dc.contributor.committeeMemberSturino, Joseph
dc.type.genrethesisen
dc.type.materialtexten
local.embargo.terms2014-07-16


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record