OUTLIER DETECTION BY MODEL COMPLEXITY A NEW DEEP LEARNING METHOD

Bang, Sung Je

dc.contributor.advisor	Walker, Duncan M
dc.creator	Bang, Sung Je
dc.date.accessioned	2021-01-29T15:01:29Z
dc.date.available	2021-01-29T15:01:29Z
dc.date.created	2020-08
dc.date.issued	2020-07-30
dc.date.submitted	August 2020
dc.identifier.uri	https://hdl.handle.net/1969.1/192181
dc.description.abstract	In this research, we developed a new method of outlier detection and removal from point-based data sets utilizing deep learning. To do this, we focused on creating an outlier detection method that would tie the outlier detection procedure and a model-building process together. Using the different behaviors of outliers and inliers, we used model complexity as an indicator for outliers in data sets. In this context, “complexity” of a model means the weight of non-zero edges in the model. This include features of a model such as number of layers and number of nodes per layer. Our proposed method of using model complexity to detect outliers consists of several steps. First, a model of low complexity (low number of layers or low number of nodes per layer) should be made and trained on a data set, and its predicted values for each instance of the data set must be recorded. Second, we need to build multiple neural network models of differing number of layers or number of nodes per layer and find a group of models of specific number of layers with the best average performance values on a given data set. Performance in this context includes general classification accuracy or mean squared error values of models. Third, within the group, we pick the model with the highest number of nodes per layer and use its predictions for each instance of the data set and compare them with the predicted values of the low-complexity model from the first step. The instances with different prediction values by both models should then be labeled as outliers and thus removed. Two factors must be noted about this method. First, the lower the correlation that attributes have to the output values in a data set, the fewer outliers the method will detect. Second, the larger and more complex a data set becomes (such as having many attributes), the fewer outliers the method will find. These factors must be noted when using this method.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	outlier	en
dc.subject	anomaly	en
dc.subject	detection	en
dc.subject	deep	en
dc.subject	learning	en
dc.subject	neural	en
dc.subject	network	en
dc.subject	model	en
dc.subject	models	en
dc.subject	complexity	en
dc.subject	layers	en
dc.subject	nodes	en
dc.title	OUTLIER DETECTION BY MODEL COMPLEXITY A NEW DEEP LEARNING METHOD	en
dc.type	Thesis	en
thesis.degree.department	Computer Science and Engineering	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Master of Science	en
thesis.degree.level	Masters	en
dc.contributor.committeeMember	Chaspari, Theodora
dc.contributor.committeeMember	Kameoka, Jun
dc.type.material	text	en
dc.date.updated	2021-01-29T15:01:30Z
local.etdauthor.orcid	0000-0003-4252-0381

Files in this item

Name:: BANG-THESIS-2020.pdf
Size:: 735.6Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record