Using Prior Knowledge in the Design of Classifiers

Shahrokh Esfahani, Mohammad

dc.contributor.advisor	Dougherty, Edward R
dc.contributor.advisor	Datta, Aniruddha
dc.creator	Shahrokh Esfahani, Mohammad
dc.date.accessioned	2015-01-09T20:25:31Z
dc.date.available	2016-05-01T05:30:53Z
dc.date.created	2014-05
dc.date.issued	2014-04-18
dc.date.submitted	May 2014
dc.identifier.uri	https://hdl.handle.net/1969.1/152567
dc.description.abstract	Small samples are commonplace in genomic/proteomic classification, the result being inadequate classifier design and poor error estimation. A promising approach to alleviate the problem is the use of prior knowledge. On the other hand, it is known that a huge amount of information is encoded and represented by biological signaling pathways. This dissertation is concerned with the problem of classifier design by utilizing both the available prior knowledge and training data. Specifically, this dissertation utilizes the concrete notion of regularization in signal processing and statistics to combine prior knowledge with different data-based or data-ignorant criteria. In the first part, we address optimal discrete classification where prior knowledge is restricted to an uncertainty class of feature distributions absent a prior distribution on the uncertainty class, a problem that arises directly for biological classification using pathway information: labeling future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. An optimization-based paradigm for utilizing prior knowledge is proposed to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: E-contamination and p-point classes. We examine the proposed paradigm on networks containing NF-k B pathways, where it shows significant improvement compared to data-driven methods. In the second part of this dissertation, we focus on Bayesian classification. Although the problem of designing the optimal Bayesian classifier , assuming some known prior distributions, has been fully addressed, a critical issue still remains: how to incorporate biological knowledge into the prior distribution. For genomic/proteomic, the most common kind of knowledge is in the form of signaling pathways. Thus, it behooves us to nd methods of transforming pathway knowledge into knowledge of the feature-label distribution governing the classi cation problem. In order to incorporate the available prior knowledge, the interactions in the pathways are first quantifi ed from a Bayesian perspective. Then, we address the problem of prior probability construction by proposing a series of optimization paradigms that utilize the incomplete prior information contained in pathways (both topological and regulatory). The optimization paradigms are derived for both Gaussian case with Normal-inverse-Wishart prior and discrete classi cation with Dirichlet prior. Simulation results, using both synthetic and real pathways, show that the proposed paradigms yield improved classi ers that outperform traditional classi ers which use only training data.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Prior Knowledge	en
dc.subject	Classifier Design	en
dc.subject	Regularization	en
dc.subject	Prior Probability Construction	en
dc.subject	Bayesian	en
dc.subject	Bayesian Classification	en
dc.title	Using Prior Knowledge in the Design of Classifiers	en
dc.type	Thesis	en
thesis.degree.department	Electrical and Computer Engineering	en
thesis.degree.discipline	Electrical Engineering	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Yoon, Byung-Jun
dc.contributor.committeeMember	Ivanov, Ivan
dc.type.material	text	en
dc.date.updated	2015-01-09T20:25:31Z
local.embargo.terms	2016-05-01
local.etdauthor.orcid	0000-0001-8896-3351

Files in this item

Name:: SHAHROKHESFAHANI-DISSERTATION- ...
Size:: 2.939Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record