Show simple item record

dc.contributor.advisorYoon, Byung-Jun
dc.contributor.advisorQian, Xiaoning
dc.creatorChen, Chun-Chi
dc.date.accessioned2019-01-16T17:27:04Z
dc.date.available2019-12-01T06:32:07Z
dc.date.created2017-12
dc.date.issued2017-10-26
dc.date.submittedDecember 2017
dc.identifier.urihttps://hdl.handle.net/1969.1/173085
dc.description.abstractThis dissertation studies the emerging topics in genome sequencing and analysis with DNA and RNA. The optimal hybrid sequencing and assembly for accurate genome reconstruction and efficient detection approaches for novel ncRNAs in genomes are discussed. The next-generation sequencing is a significant topic that provides whole genetic information for the further biological research. Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. To date, many hybrid genome assembly algorithms have been developed that can take reads from multiple read sources to reconstruct the original genome. An important aspect of hybrid sequencing and assembly is that the feasibility conditions for genome reconstruction can be satisfied by different combinations of the available read sources, opening up the possibility of optimally combining the sources to minimize the sequencing cost while ensuring accurate genome reconstruction. In this study, we derive the conditions for whole genome reconstruction from multiple read sources at a given confidence level and also introduce the optimal strategy for combining reads from different sources to minimize the overall sequencing cost. We show that the optimal read set, which simultaneously satisfies the feasibility conditions for genome reconstruction and minimizes the sequencing cost, can be effectively predicted through constrained discrete optimization. The availability of genome-wide sequences for a variety of species provides a large database for the further RNA analysis with computational methods. Recent studies have shown that noncoding RNAs (ncRNAs) are known to play crucial roles in various biological processes, and some ncRNAs are related to the genome stability and a variety of inherited diseases. The discovery of novel ncRNAs is hence an important topic, and there is a pressing need for accurate computational detection approaches that can be used to efficiently detect novel ncRNAs in genomes. One important issue is RNA structure alignment for comparative genome analysis, as RNA secondary structures are better conserved than the RNA sequences. Simultaneous RNA alignment and folding algorithms aim to accurately align RNAs by predicting the consensus structure and alignment at the same time, but the computational complexity of the optimal dynamic programming algorithm for simultaneous alignment and folding is extremely high. In this work, we proposed an innovative method, TOPAS, for RNA structural alignment that can efficiently align RNAs through topological networks. Although many ncRNAs are known to have a well conserved secondary structure, which provides useful clues for computational prediction, the prediction of ncRNAs is still challenging, since it has been shown that a structure-based approach alone may not be sufficient for detecting ncRNAs in a single sequence. In this study, we first develop a new approach by utilizing the n-gram model to classify the sequences and extract effective features to capture sequence homology. Based on this approach, we propose an advanced method, piRNAdetect, for reliable computational prediction of piRNAs in genome sequences. Utilizing the n-gram model can enhance the detection of ncRNAs that have sparse folding structures with many unpaired bases. By incorporating the n-gram model with the generalized ensemble defect, which assesses structure conservation and conformation to the consensus structure, we further propose RNAdetect, a novel computational method for accurate detection of ncRNAs through comparative genome analysis. Extensive performance evaluation based on the Rfam database and bacterial genomes demonstrates that our approaches can accurately and reliably detect novel ncRNAs, outperforming the current advanced methods.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectDNA assemblyen
dc.subjectwhole genome reconstructionen
dc.subjectRNA structural alignmenten
dc.subjecttopological networken
dc.subjectpiRNA predictionen
dc.subjectn-gram modelen
dc.subjectsupport vector machineen
dc.subjectncRNA predictionen
dc.subjectcomparative genome analysisen
dc.subjectgeneralized ensemble defecten
dc.titleEmerging Topics in Genome Sequencing and Analysisen
dc.typeThesisen
thesis.degree.departmentElectrical and Computer Engineeringen
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorTexas A & M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberDougherty, Edward
dc.contributor.committeeMemberKumar, P. R.
dc.contributor.committeeMemberShim, Won-Bo
dc.type.materialtexten
dc.date.updated2019-01-16T17:27:04Z
local.embargo.terms2019-12-01
local.etdauthor.orcid0000-0002-4545-6760


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record