A Novel Bayesian Rank-Based Framework for the Classification of High-Dimensional Biological Data
MetadataShow full item record
Statistical analysis of high-dimensional biological data is the central component of “personalized medicine” and “translational bioinformatics.” Two major barriers limit the application of the extracted information in clinical studies. These barriers are small sample size and lack of biological interpretability due to the complex classification boundaries of current algorithms. Motivated in removing these barriers, we focus in this dissertation to introduce novel statistical analysis algorithms of high-dimensional biological data. We first introduce a novel predictive model. In particular, we extend the top-scoring pair algorithm to a Bayesian setting. We test the performance on several real datasets and various simulated data scenarios and show the proposed method has the best overall performance. Besides having high accuracy rates on real and simulated data sets, the proposed algorithm has the potential to discover gene markers that may be missed via other algorithms. We also suggested the Bayesian Top-Scoring Pair (BTSP) as a feature selection method. We compared the proposed algorithm with many well-known feature selection methods by combining the feature selection methods with different well-known classifiers. We checked the performance of all feature selection methods for different data sets and for different numbers of genes. The proposed BTSP algorithm has the best overall accuracy rates. Finally, we introduce a novel biological pathway data-based algorithm (BTSPP). This algorithm uses all pairwise interactions in the gene level and pathway level. We apply the proposed method and well-known pathway data-based algorithms to different real data sets and check performances in terms of accurately classifying independent test sets and show the proposed algorithm superiority. We also checked the ability to find the biologically validated pathways related with diseases of these pathway data-based algorithms, over-representation analysis (ORA), and gene set enrichment analysis (GSEA). The proposed pathway analysis method has the potential to find the biologically validated pathways, whereas the others cannot detect the biologically validated pathways.
Arslan, Emre (2018). A Novel Bayesian Rank-Based Framework for the Classification of High-Dimensional Biological Data. Doctoral dissertation, Texas A & M University. Available electronically from