Incomplete gene structure prediction with almost 100% specificity

Chin, See Loong

dc.contributor.advisor	Sze, Sing-Hoi
dc.creator	Chin, See Loong
dc.date.accessioned	2004-09-30T01:48:32Z
dc.date.available	2004-09-30T01:48:32Z
dc.date.created	2003-12
dc.date.issued	2004-09-30
dc.identifier.uri	https://hdl.handle.net/1969.1/258
dc.description.abstract	The goals of gene prediction using computational approaches are to determine gene location and the corresponding functionality of the coding region. A subset of gene prediction is the gene structure prediction problem, which is to define the exon-intron boundaries of a gene. Gene prediction follows two general approaches: statistical patterns identification and sequence similarity comparison. Similarity based approaches have gained increasing popularity with the recent vast increase in genomic data in GenBank. The proposed gene prediction algorithm is a similarity based algorithm which capitalizes on the fact that similar sequences bear similar functions. The proposed algorithm, like most other similarity based algorithms, is based on dynamic programming. Given a genomic DNA, X = x1 xn and a closely related cDNA, Y = y1 yn, these sequences are aligned with matching pairs stored in a data set. These indexes of matching sets contain a large jumble of all matching pairs, with a lot of cross over indexes. Dynamic programming alignment is again used to retrieve the longest common non-crossing subsequence from the collection of matching fragments in the data set. This algorithm was implemented in Java on the Unix platform. Statistical comparisons were made against other software programs in the field. Statistical evaluation at both the DNA and exonic level were made against Est2genome, Sim4, Spidey, and Fgenesh-C. The proposed gene structure prediction algorithm, by far, has the best performance in the specificity category. The resulting specificity was greater than 98%. The proposed algorithm also has on par results in terms of sensitivity and correlation coeffcient. The goal of developing an algorithm to predict exonic regions with a very high level of correctness was achieved.	en
dc.format.extent	613451 bytes	en
dc.format.extent	38890 bytes	en
dc.format.medium	electronic	en
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.language.iso	en_US
dc.publisher	Texas A&M University
dc.subject	Gene Structure Prediction	en
dc.subject	Dynamic Programming	en
dc.subject	Specificity	en
dc.title	Incomplete gene structure prediction with almost 100% specificity	en
dc.type	Book	en
dc.type	Thesis	en
thesis.degree.department	Computer Science	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	MS	en
thesis.degree.level	Masters	en
dc.contributor.committeeMember	Xiong, Jin
dc.contributor.committeeMember	Ioerger, Thomas
dc.type.genre	Electronic Thesis	en
dc.type.material	text	en
dc.format.digitalOrigin	born digital	en

Files in this item

Name:: etd-tamu-2003C-CPSC-Chin-1.pdf
Size:: 599.0Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record