Show simple item record

dc.contributor.advisorShell, Dylan A
dc.creatorJanpuangtong, Sasin
dc.date.accessioned2020-08-26T18:28:39Z
dc.date.available2020-08-26T18:28:39Z
dc.date.created2019-12
dc.date.issued2019-11-25
dc.date.submittedDecember 2019
dc.identifier.urihttps://hdl.handle.net/1969.1/188752
dc.description.abstractThis dissertation attempts to address the changing needs of data science and analytics: making it easier to produce accurate models opening up opportunities and perspectives for novices to make sense of existing data. This work aims to incorporate semantics of data in addressing classical machine learning problems, which is one way to tame the deluge of data. The increased availability of data and the existence of easy-to-use procedures for regression and classification in commodity software allows anyone to search for correlations amongst a large set of variables with scant regard of their meaning. Consequently, people tend to use data indiscriminately, leading to the practice of data dredging. It is easy to use sophisticated tools to produce specious models, which generalize poorly and may lead to wrong conclusions. Despite much effort having been placed on advancing learning algorithms, current tools do little to shield people from using data in a semantically lax fashion. By examining the entire model building process and supplying semantic information derived from high-level knowledge in the form of an ontology, the machine can assist in exercising discretion to help the model builder avoid the pitfalls of data dredging. This work introduces a metric, called conceptual distance, to incorporate semantic information into the model building process. The conceptual distance is shown to be practically computed from large-scale existing ontologies. This metric is exploited in feature selection to enable a machine to take semantics of features into consideration when choosing them to build a model. Experiments with ontologies and real world datasets show the comparable performance of this metric in selecting a feature subset to the traditional data-driven measurements, in spite of using only labels of features, not the associated measures. Further, a new end-to-end model building process is developed by using the conceptual distance as a guideline to explore an ontological structure and retrieve relevant features automatically, making it convenient for a novice to build a semantically pertinent model. Experiments show that the proposed model building process can help a user to produce a model with performance comparable to that built by a domain expert. This work offers a tool to help the common man battle the hazard of data dredging that comes from the indiscriminate use of data. The tool results in models with improved generalization and easy to interpret, leading to better decisions or implications.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectOntologiesen
dc.subjectSupervised Learningen
dc.subjectModel Building Processen
dc.subjectFeature Selectionen
dc.subjectNovicesen
dc.subjectSemantic Weben
dc.subjectFeature Recommendationen
dc.subjectBackground knowledge and Semantic Informationen
dc.titleExploiting Semantics from Widely Available Ontologies to Aid the Model Building Processen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberCaverlee, James
dc.contributor.committeeMemberFuhrmann, Matthew
dc.contributor.committeeMemberFuruta, Richard
dc.type.materialtexten
dc.date.updated2020-08-26T18:28:39Z
local.etdauthor.orcid0000-0002-3356-4799


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record