Finding conserved patterns in biological sequences, networks and genomes

Yang, Qingwu

dc.contributor.advisor	Sze, Sing-Hoi
dc.creator	Yang, Qingwu
dc.date.accessioned	2010-01-15T00:09:01Z
dc.date.accessioned	2010-01-16T00:34:52Z
dc.date.available	2010-01-15T00:09:01Z
dc.date.available	2010-01-16T00:34:52Z
dc.date.created	2007-12
dc.date.issued	2009-05-15
dc.identifier.uri	https://hdl.handle.net/1969.1/ETD-TAMU-2465
dc.description.abstract	Biological patterns are widely used for identifying biologically interesting regions within macromolecules, classifying biological objects, predicting functions and studying evolution. Good pattern finding algorithms will help biologists to formulate and validate hypotheses in an attempt to obtain important insights into the complex mechanisms of living things. In this dissertation, we aim to improve and develop algorithms for five biological pattern finding problems. For the multiple sequence alignment problem, we propose an alternative formulation in which a final alignment is obtained by preserving pairwise alignments specified by edges of a given tree. In contrast with traditional NPhard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while having very good accuracy. For the path matching problem, we take advantage of the linearity of the query path to reduce the problem to finding a longest weighted path in a directed acyclic graph. We can find k paths with top scores in a network from the query path in polynomial time. As many biological pathways are not linear, our graph matching approach allows a non-linear graph query to be given. Our graph matching formulation overcomes the common weakness of previous approaches that there is no guarantee on the quality of the results. For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparisons of clusters of different sizes. We explore both a restricted version which requires that orthologous genes are strictly ordered within each cluster, and the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome. We solve the first problem in polynomial time and develop practical exact algorithms for the second one. In the gene cluster querying problem, based on a querying strategy, we propose an efficient approach for investigating clustering of related genes across multiple genomes for a given gene cluster. By analyzing gene clustering in 400 bacterial genomes, we show that our algorithm is efficient enough to study gene clusters across hundreds of genomes.	en
dc.format.medium	electronic	en
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	biological pattern	en
dc.subject	bioinformatics	en
dc.title	Finding conserved patterns in biological sequences, networks and genomes	en
dc.type	Book	en
dc.type	Thesis	en
thesis.degree.department	Computer Science	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Amato, Nancy
dc.contributor.committeeMember	Chen, Jianer
dc.contributor.committeeMember	Ebbole, Daniel J.
dc.type.genre	Electronic Dissertation	en
dc.type.material	text	en
dc.format.digitalOrigin	born digital	en

Files in this item

Name:: YANG-DISSERTATION.pdf
Size:: 682.3Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record