Show simple item record

dc.contributor.advisorSerpedin, Erchin
dc.contributor.advisorNounou, Hazem
dc.creatorWajid, Bilal
dc.date.accessioned2015-10-29T19:40:53Z
dc.date.available2017-08-01T05:37:30Z
dc.date.created2015-08
dc.date.issued2015-06-25
dc.date.submittedAugust 2015
dc.identifier.urihttps://hdl.handle.net/1969.1/155464
dc.description.abstractBioinformatics skills required for genome sequencing often represent a significant hurdle for many researchers working in computational biology. This dissertation highlights the significance of genome assembly as a research area, focuses on its need to remain accurate, provides details about the characteristics of the raw data, examines some key metrics, emphasizes some tools and outlines the whole pipeline for next-generation sequencing. Currently, a major effort is being put towards the assembly of the genomes of all living organisms. Given the importance of comparative genome assembly, herein dissertation, the principle of Minimum Description Length (MDL) and its two variants, the Two-Part MDL and Sophisticated MDL, are explored in identifying the optimal reference sequence for genome assembly. Thereafter, a Modular Approach to Reference Assisted Genome Assembly Pipeline, referred to as MARAGAP, is developed. MARAGAP uses the principle of Minimum Description Length (MDL) to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and Single Nucleotide Polymorphisms (SNPs) in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs (called BECA). BECA effectively capitalizes on the `alignment-layout-consensus' paradigm and Quality (Q-) values for detecting and correcting SNPs by evaluating a number of probabilistic measures. However, the entire process is conducted once. BECA's framework is further extended by using Gibbs Sampling for further iterations of BECA. After each assembly the reference sequence is updated and the probabilistic score for each base call renewed. The revised reference sequence and probabilities are then further used to identify the alignments and consensus sequence, thereby, yielding an algorithm referred to as Gibbs-BECA. Gibbs-BECA further improves the performance both in terms of rectifying more SNPs and offering a robust performance even in the presence of a poor reference sequence. Lastly, another major effort in this dissertation was the development of two cohesive software platforms that combine many different genome assembly pipelines in two distinct environments, referred to as Baari and Genobuntu, respectively. Baari and Genobuntu support pre-assembly tools, genome assemblers as well as post-assembly tools. Additionally, a library of tools developed by the authors for Next Generation Sequencing (NGS) data and commonly used biological software have also been provided in these software platforms. Baari and Genobuntu are free, easily distributable and facilitate building laboratories and software workstations both for personal use as well as for a college/university laboratory. Baari is a customized Ubuntu OS packed with the tools mentioned beforehand whereas Genobuntu is a software package containing the same tools for users who already have Ubuntu OS pre-installed on their systems.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectGenome assemblyen
dc.subjectMinimum Description Lengthen
dc.subjectReference assisted assemblyen
dc.subjectDe-Bruijn Graphen
dc.subjectBayesian Statisticsen
dc.subjectComparative assemblyen
dc.subjectGibbs Samplingen
dc.subjectUbuntuen
dc.subjectLinuxen
dc.titleInformation Theory, Graph Theory and Bayesian Statistics based improved and robust methods in Genome Assemblyen
dc.typeThesisen
thesis.degree.departmentElectrical and Computer Engineeringen
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorTexas A & M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberNounou, Mohamed
dc.contributor.committeeMemberKarsilayan, Aydin
dc.contributor.committeeMemberYoon, Byung Jun
dc.type.materialtexten
dc.date.updated2015-10-29T19:40:53Z
local.embargo.terms2017-08-01
local.etdauthor.orcid0000-0002-1822-2387


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record