Show simple item record

dc.creatorAbbasi, Isha Shakeel A M
dc.creatorAbdulgadir, Rawan
dc.creatorMazen, Weam
dc.creatorMohamed, Nadin
dc.date.accessioned2021-07-24T00:26:39Z
dc.date.available2021-07-24T00:26:39Z
dc.date.created2021-05
dc.date.submittedMay 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/194349
dc.description.abstractGenomic copy number data are a rich source of information about the biological systems they are collected from. They can be used for the diagnoses of various diseases by identifying the locations and extent of aberrations in DNA sequences. However, copy number data are often contaminated with measurement noise which drastically affects the quality and usefulness of the data. The objective of this project is to apply some of the statistical filtering and fault detection techniques to improve the accuracy of diagnosis of diseases by enhancing the accuracy of determining the locations of such aberrations. Some of these techniques include multiscale wavelet-based filtering and hypothesis testing based fault detection. The filtering techniques include Mean Filtering (MF), Exponentially Weighted Moving Average (EWMA), Standard Multiscale Filtering (SMF) and Boundary Corrected Translation Invariant filtering (BCTI). The fault detection techniques include the Shewhart chart, EWMA and Generalized Likelihood Ratio (GLR). The performance of these techniques is illustrated using Monte Carlo simulations and through their application on real copy number data. Based on the Monte Carlo simulations, the non-linear filtering techniques performed better than the linear techniques, with BCTI performing with the least error . At an SNR of 1, BCTI technique had an average mean squared error of 2.34% whereas mean filtering technique had the highest error of 5.24%. As for the fault detection techniques, GLR had the lowest missed detection rate of 1.88% at a fixed false alarm rate of around 4%. At around the same false alarm rate, the Shewhart chart had the highest missed detection of 67.4%. Furthermore, these techniques were applied on real genomic copy number data sets. These included data from breast cancer cell lines (MPE600) and colorectal cancer cell lines (SW837).en
dc.format.mimetypeapplication/pdf
dc.subjectGenomic copy number dataen
dc.subjectdata filteringen
dc.subjectmultiscale wavelet-based filteringen
dc.subjectfault detectionen
dc.subjectControl chartsen
dc.subjectGeneralized likelihood ratio testen
dc.titleTowards Enhanced Diagnosis of Diseases using Statistical Analysis of Genomic Copy Number Dataen
dc.typeThesisen
thesis.degree.departmentChemical Engineeringen
thesis.degree.disciplineChemical Engineeringen
thesis.degree.grantorUndergraduate Research Scholars Programen
thesis.degree.nameB.S.en
thesis.degree.levelUndergraduateen
dc.contributor.committeeMemberNounou, Mohamed N.
dc.type.materialtexten
dc.date.updated2021-07-24T00:26:39Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record