Show simple item record

dc.contributor.advisorSze, Sing-Hoi
dc.creatorZhao, Xiaoyan
dc.date.accessioned2012-02-14T22:18:37Z
dc.date.accessioned2012-02-16T16:14:40Z
dc.date.available2012-02-14T22:18:37Z
dc.date.available2012-02-16T16:14:40Z
dc.date.created2010-12
dc.date.issued2012-02-14
dc.date.submittedDecember 2010
dc.identifier.urihttps://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8834
dc.description.abstractUnderstanding the mechanisms that regulate gene expression is a major challenge in biology. One of the most important tasks in this challenge is to identify the transcription factors binding sites (TFBS) in DNA sequences. The common representation of these binding sites is called “motif” and the discovery of TFBS problem is also referred as motif finding problem in computer science. Despite extensive efforts in the past decade, none of the existing algorithms perform very well. This dissertation focuses on this difficult problem and proposes three new methods (MotifEnumerator, PosMotif, and Enrich) with excellent improvements. An improved pattern-driven algorithm, MotifEnumerator, is first proposed to detect the optimal motif with reduced time complexity compared to the traditional exact pattern-driven approaches. This strategy is further extended to allow arbitrary don’t care positions within a motif without much decrease in solvable values of motif length. The performance of this algorithm is comparable to the best existing motif finding algorithms on a large benchmark set of samples. Another algorithm with further post processing, PosMotif, is proposed to use a string representation that allows arbitrary ignored positions within the non-conserved portion of single motifs, and use Markov chains to model the background distributions of motifs of certain length while skipping these positions within each Markov chain. Two post processing steps considering redundancy information are applied in this algorithm. PosMotif demonstrates an improved performance compared to the best five existing motif finding algorithms on several large benchmark sets of samples. The third method, Enrich, is proposed to improve the performance of general motif finding algorithms by adding more sequences to the samples in the existing benchmark datasets. Five famous motif finding algorithms have been chosen to run on the original datasets and the enriched datasets, and the performance comparisons show a general great improvement on the enriched datasets.en
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectComputational Biologyen
dc.subjectMotif findingen
dc.subjectTranscriptionen
dc.titleImproved Algorithms for Discovery of Transcription Factor Binding Sites in DNA Sequencesen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberChen, Jianer
dc.contributor.committeeMemberSarin, Vivek
dc.contributor.committeeMemberFan, Ruzong
dc.type.genrethesisen
dc.type.materialtexten


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record