Texas A&M University LibrariesTexas A&M University LibrariesTexas A&M University Libraries
    • Help
    • Login
    OAKTrust
    View Item 
    •   OAKTrust Home
    • Colleges and Schools
    • Office of Graduate and Professional Studies
    • Electronic Theses, Dissertations, and Records of Study (2002– )
    • View Item
    •   OAKTrust Home
    • Colleges and Schools
    • Office of Graduate and Professional Studies
    • Electronic Theses, Dissertations, and Records of Study (2002– )
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Exploiting Semantics from Widely Available Ontologies to Aid the Model Building Process

    Thumbnail
    View/ Open
    JANPUANGTONG-DISSERTATION-2019.pdf (1.811Mb)
    Date
    2019-11-25
    Author
    Janpuangtong, Sasin
    Metadata
    Show full item record
    Abstract
    This dissertation attempts to address the changing needs of data science and analytics: making it easier to produce accurate models opening up opportunities and perspectives for novices to make sense of existing data. This work aims to incorporate semantics of data in addressing classical machine learning problems, which is one way to tame the deluge of data. The increased availability of data and the existence of easy-to-use procedures for regression and classification in commodity software allows anyone to search for correlations amongst a large set of variables with scant regard of their meaning. Consequently, people tend to use data indiscriminately, leading to the practice of data dredging. It is easy to use sophisticated tools to produce specious models, which generalize poorly and may lead to wrong conclusions. Despite much effort having been placed on advancing learning algorithms, current tools do little to shield people from using data in a semantically lax fashion. By examining the entire model building process and supplying semantic information derived from high-level knowledge in the form of an ontology, the machine can assist in exercising discretion to help the model builder avoid the pitfalls of data dredging. This work introduces a metric, called conceptual distance, to incorporate semantic information into the model building process. The conceptual distance is shown to be practically computed from large-scale existing ontologies. This metric is exploited in feature selection to enable a machine to take semantics of features into consideration when choosing them to build a model. Experiments with ontologies and real world datasets show the comparable performance of this metric in selecting a feature subset to the traditional data-driven measurements, in spite of using only labels of features, not the associated measures. Further, a new end-to-end model building process is developed by using the conceptual distance as a guideline to explore an ontological structure and retrieve relevant features automatically, making it convenient for a novice to build a semantically pertinent model. Experiments show that the proposed model building process can help a user to produce a model with performance comparable to that built by a domain expert. This work offers a tool to help the common man battle the hazard of data dredging that comes from the indiscriminate use of data. The tool results in models with improved generalization and easy to interpret, leading to better decisions or implications.
    URI
    https://hdl.handle.net/1969.1/188752
    Subject
    Ontologies
    Supervised Learning
    Model Building Process
    Feature Selection
    Novices
    Semantic Web
    Feature Recommendation
    Background knowledge and Semantic Information
    Collections
    • Electronic Theses, Dissertations, and Records of Study (2002– )
    Citation
    Janpuangtong, Sasin (2019). Exploiting Semantics from Widely Available Ontologies to Aid the Model Building Process. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /188752.

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Advanced Search

    Browse

    All of OAKTrustCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDepartmentTypeThis CollectionBy Issue DateAuthorsTitlesSubjectsDepartmentType

    My Account

    LoginRegister

    Statistics

    View Usage Statistics
    Help and Documentation

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV