Exploiting Semantics from Widely Available Ontologies to Aid the Model Building Process

Janpuangtong, Sasin

dc.contributor.advisor	Shell, Dylan A
dc.creator	Janpuangtong, Sasin
dc.date.accessioned	2020-08-26T18:28:39Z
dc.date.available	2020-08-26T18:28:39Z
dc.date.created	2019-12
dc.date.issued	2019-11-25
dc.date.submitted	December 2019
dc.identifier.uri	https://hdl.handle.net/1969.1/188752
dc.description.abstract	This dissertation attempts to address the changing needs of data science and analytics: making it easier to produce accurate models opening up opportunities and perspectives for novices to make sense of existing data. This work aims to incorporate semantics of data in addressing classical machine learning problems, which is one way to tame the deluge of data. The increased availability of data and the existence of easy-to-use procedures for regression and classification in commodity software allows anyone to search for correlations amongst a large set of variables with scant regard of their meaning. Consequently, people tend to use data indiscriminately, leading to the practice of data dredging. It is easy to use sophisticated tools to produce specious models, which generalize poorly and may lead to wrong conclusions. Despite much effort having been placed on advancing learning algorithms, current tools do little to shield people from using data in a semantically lax fashion. By examining the entire model building process and supplying semantic information derived from high-level knowledge in the form of an ontology, the machine can assist in exercising discretion to help the model builder avoid the pitfalls of data dredging. This work introduces a metric, called conceptual distance, to incorporate semantic information into the model building process. The conceptual distance is shown to be practically computed from large-scale existing ontologies. This metric is exploited in feature selection to enable a machine to take semantics of features into consideration when choosing them to build a model. Experiments with ontologies and real world datasets show the comparable performance of this metric in selecting a feature subset to the traditional data-driven measurements, in spite of using only labels of features, not the associated measures. Further, a new end-to-end model building process is developed by using the conceptual distance as a guideline to explore an ontological structure and retrieve relevant features automatically, making it convenient for a novice to build a semantically pertinent model. Experiments show that the proposed model building process can help a user to produce a model with performance comparable to that built by a domain expert. This work offers a tool to help the common man battle the hazard of data dredging that comes from the indiscriminate use of data. The tool results in models with improved generalization and easy to interpret, leading to better decisions or implications.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Ontologies	en
dc.subject	Supervised Learning	en
dc.subject	Model Building Process	en
dc.subject	Feature Selection	en
dc.subject	Novices	en
dc.subject	Semantic Web	en
dc.subject	Feature Recommendation	en
dc.subject	Background knowledge and Semantic Information	en
dc.title	Exploiting Semantics from Widely Available Ontologies to Aid the Model Building Process	en
dc.type	Thesis	en
thesis.degree.department	Computer Science and Engineering	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Texas A&M University	en
thesis.degree.name	Doctor of Philosophy	en
thesis.degree.level	Doctoral	en
dc.contributor.committeeMember	Caverlee, James
dc.contributor.committeeMember	Fuhrmann, Matthew
dc.contributor.committeeMember	Furuta, Richard
dc.type.material	text	en
dc.date.updated	2020-08-26T18:28:39Z
local.etdauthor.orcid	0000-0002-3356-4799

Files in this item

Name:: JANPUANGTONG-DISSERTATION-2019.pdf
Size:: 1.811Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record