Show simple item record

dc.contributor.advisorKianfar, Kiavash
dc.creatorGujjula, Krishna Reddy
dc.date.accessioned2019-01-18T03:10:11Z
dc.date.available2019-01-18T03:10:11Z
dc.date.created2018-08
dc.date.issued2018-05-23
dc.date.submittedAugust 2018
dc.identifier.urihttps://hdl.handle.net/1969.1/173701
dc.description.abstractThe research in this dissertation focuses on developing a novel methodology for ChIPSeq dataset analysis. Despite its advances, the standard ChIP-Seq data analysis pipeline, i.e., read mapping followed by peak calling has the following shortcomings: 1. Majority of the ChIP-Seq dataset consists of background reads, hence unnecessary computation effort is spent on mapping reads that have no role in forming the true peaks. 2. Unnecessary computation effort is spent on aligning control reads which do not map to ChIP-enriched genomic regions. 3. Multi-mappable reads are often discarded during the read mapping, resulting in the reduced power to identify peaks in repeat elements of the genome. We present Map2Peak, a novel tool aimed at mitigating the aforementioned drawbacks. Map2Peak receives ChIP-Seq and control unmapped reads as the input and presents the peaks as the output at a speed twice faster than that of standard workflow. Map2Peak intertwines partial read mappings and peak calling in a five-phase algorithm. It models the fragment count information obtained during the early stages of ChIP read mapping (Phase 1) as a 2-component Poisson mixture model, and then implements expectation-maximization algorithm to identify ChIP enriched regions (Phase 2). The remaining ChIP reads and majority of control reads are then restricted to map exactly only to the much shorter pseudo-genome composed of the ChIP enriched regions (Phase 3 & 4). The mapping information is then used to call peaks on pseudo-genome (Phase 5). Our results show that the peaks called by Map2Peak encompass most of the peaks called by the standard workflow (88%-96%) and some novel motif-justifiable peaks which are not detected by the standard workflow, and majority (90%) of the background reads are discarded. Moreover, Map2Peak implicitly resolves the alignment location for some of the multi-mappable reads which result in increased power to call peaks in repeat elements of the genome. Map2Peak provides researchers with an ultrafast peak caller which utilizes whole ChIP-Seq dataset without discarding multi-mappable reads to identify peaks, and efficiently utilize control datasets for the purpose of peak calling. “Map2Peak” is available at https://kianfar.engr.tamu.edu/map2peak/.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectChIP-Seqen
dc.subjectPeak-callingen
dc.subjectE-M algorithmen
dc.subjectRead mappingen
dc.subjectPoisson mixture modelen
dc.subjectmulti-mappable readsen
dc.titleMap2Peak: A Novel Perspective on ChIP-Seq Data Analysis Pipelineen
dc.typeThesisen
thesis.degree.departmentIndustrial and Systems Engineeringen
thesis.degree.disciplineIndustrial Engineeringen
thesis.degree.grantorTexas A & M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberDing, Yu
dc.contributor.committeeMemberButenko, Sergiy
dc.contributor.committeeMemberYu, Peng
dc.type.materialtexten
dc.date.updated2019-01-18T03:10:12Z
local.etdauthor.orcid0000-0002-8283-7896


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record