Show simple item record

dc.contributor.advisorSze, Sing-Hoi
dc.creatorChin, See Loong
dc.date.accessioned2004-09-30T01:48:32Z
dc.date.available2004-09-30T01:48:32Z
dc.date.created2003-12
dc.date.issued2004-09-30
dc.identifier.urihttps://hdl.handle.net/1969.1/258
dc.description.abstractThe goals of gene prediction using computational approaches are to determine gene location and the corresponding functionality of the coding region. A subset of gene prediction is the gene structure prediction problem, which is to define the exon-intron boundaries of a gene. Gene prediction follows two general approaches: statistical patterns identification and sequence similarity comparison. Similarity based approaches have gained increasing popularity with the recent vast increase in genomic data in GenBank. The proposed gene prediction algorithm is a similarity based algorithm which capitalizes on the fact that similar sequences bear similar functions. The proposed algorithm, like most other similarity based algorithms, is based on dynamic programming. Given a genomic DNA, X = x1 xn and a closely related cDNA, Y = y1 yn, these sequences are aligned with matching pairs stored in a data set. These indexes of matching sets contain a large jumble of all matching pairs, with a lot of cross over indexes. Dynamic programming alignment is again used to retrieve the longest common non-crossing subsequence from the collection of matching fragments in the data set. This algorithm was implemented in Java on the Unix platform. Statistical comparisons were made against other software programs in the field. Statistical evaluation at both the DNA and exonic level were made against Est2genome, Sim4, Spidey, and Fgenesh-C. The proposed gene structure prediction algorithm, by far, has the best performance in the specificity category. The resulting specificity was greater than 98%. The proposed algorithm also has on par results in terms of sensitivity and correlation coeffcient. The goal of developing an algorithm to predict exonic regions with a very high level of correctness was achieved.en
dc.format.extent613451 bytesen
dc.format.extent38890 bytesen
dc.format.mediumelectronicen
dc.format.mimetypeapplication/pdf
dc.format.mimetypetext/plain
dc.language.isoen_US
dc.publisherTexas A&M University
dc.subjectGene Structure Predictionen
dc.subjectDynamic Programmingen
dc.subjectSpecificityen
dc.titleIncomplete gene structure prediction with almost 100% specificityen
dc.typeBooken
dc.typeThesisen
thesis.degree.departmentComputer Scienceen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameMSen
thesis.degree.levelMastersen
dc.contributor.committeeMemberXiong, Jin
dc.contributor.committeeMemberIoerger, Thomas
dc.type.genreElectronic Thesisen
dc.type.materialtexten
dc.format.digitalOriginborn digitalen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record