Signal Processing and Machine Learning Techniques for Analyzing Metagenomic Data

Alshawaqfeh, Mustafa Kamal Mustafa

dc.contributor.advisor	Serpedin, Erchin
dc.contributor.advisor	Qaraqe, Khalid
dc.creator	Alshawaqfeh, Mustafa Kamal Mustafa
dc.date.accessioned	2017-08-21T14:38:31Z
dc.date.available	2019-05-01T06:08:42Z
dc.date.created	2017-05
dc.date.issued	2017-04-21
dc.date.submitted	May 2017
dc.identifier.uri	https://hdl.handle.net/1969.1/161461
dc.description.abstract	Recent advances in high-throughput sequencing technologies open a new era of genomics studies, called metagenomics. Rapidly, metagenomics has presented itself as the standard approach for characterizing the compositional and functional capacity of microbial communities by direct study of the genetic contents recovered from environmental samples without prior culturing. Although these advancements enable researchers to sequence bacterial populations at a reasonable budget, analyzing these massive metagenomic datasets presents significant challenges. This dissertation presents novel computational tools, based on signal processing and machine learning theories, to enable the investigation of biological systems. Two important research problems are addressed in this dissertation. The first problem addressed herein concerns the identification of the potential metagenomic biomarkers, which play a critical role in understanding the biological process under study and developing possible therapies. Due to the lack of knowledge of the true biomarkers and a standard assessment methodology, evaluating the quality of the detected markers is challenging. Therefore, we begin by developing an evaluation protocol that mimics the knowledge of the true markers to provide a common ground to compare competing algorithms. Next, a new framework for the biomarker discovery problem based on a low rank-sparse (LRS) decomposition is proposed. The instability of a biomarker detection algorithm renders the identified markers questionable and hinders the translation of these findings into clinical applications. To mitigate this problem, we propose the Regularized Low Rank-Sparse Decomposition (RegLRSD) algorithm. RegLRSD adapts the LRS model to incorporate the fact that irrelevant features are expected to present abundance profiles that do not exhibit a significant variation between samples belonging to different ii phenotypes. Integrating this prior knowledge helps to guide the recovery process to more accurate and consistent biological results. The second research problem addressed in this dissertation concerns the development of a computational framework to enable the translation of the identified markers into clinical applications. Identifying potential biomarkers is the foremost step in the process of understanding the relation between the microbial composition shift due to a certain disease. However, from a practical perspective, the microbial alteration needs to be quantified in a single numerical value, which helps clinicians to measure the disease activity and its response to therapy.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Metagenomic	en
dc.title	Signal Processing and Machine Learning Techniques for Analyzing Metagenomic Data	en
dc.type	Thesis	en
thesis.degree.department	Electrical and Computer Engineering	en
thesis.degree.discipline	Electrical Engineering	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Braga-Neto, Ulisses
dc.contributor.committeeMember	Suchodolski, Jan
dc.type.material	text	en
dc.date.updated	2017-08-21T14:38:31Z
local.embargo.terms	2019-05-01
local.etdauthor.orcid	0000-0003-2170-6830

Files in this item

Name:: ALSHAWAQFEH-DISSERTATION-2017.pdf
Size:: 1.516Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record