Show simple item record

dc.contributor.advisorHu, Xia
dc.creatorMartinez Garcia, Diego Serafin
dc.date.accessioned2023-02-07T16:06:38Z
dc.date.available2023-02-07T16:06:38Z
dc.date.created2022-05
dc.date.issued2022-03-22
dc.date.submittedMay 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/197162
dc.description.abstractMachine Learning (ML) has been rapidly progressing through the years due to its versatility to solve different problems. As a result, many machine learning frameworks have been created that are a collection of different algorithms. However, data scientists often struggle to determine which one to use to develop their ML solutions due to many options. For this reason, many Automated Machine Learning (AutoML) systems have been created to help users create ML solutions easily by defining the problem they want to solve, providing the data, and setting a budget such as time search for a solution. Through the years, many AutoML systems have been developed, and their applications range from solving simple tasks such as tabular classification to more complex such as object detection. Due to many AutoML systems, it becomes challenging for users to determine which one suits them the best because most of the systems focus on specific tasks and data, and sometimes they overlap on the tasks they can solve. Another issue that users need to be aware of is that although most of the search process is already automated, it is necessary for the user to get involved in data preprocessing in most systems. Such preprocessing can be trivial, from selecting images files to more challenging tasks such as merging multiple database tables. Another aspect of AutoML systems is the research involved in the development. AutoML research can focus on the way of searching, to some more deep processes such as optimizing how models are run. This research leads to the creation of many AutoML systems every year. Creating an AutoML system is challenging since there are many things to consider, from the design to the implementation, and attempting to use an existing system to test a new hypothesis becomes challenging. The challenge of reusing an existing AutoML system is that most systems were designed towards proposing some research improvement rather than usability. Another problem is that these systems are not maintained, and if they are maintained, it is difficult to use them due to the lack of documentation. Enabling the advance on AutoML is challenging. Firstly, it is necessary to standardize compo-nents, so there is a more efficient way to compare different frameworks. There are many AutoML systems every year with new search strategies. However, it becomes challenging to objectively compare them since they could be improving in other areas rather than the ones claimed. Secondly, creating an AutoML system should not be challenging since it can stop many researchers from contributing to the field. Creating a new AutoML system with the sole purpose of testing a new component should not be difficult. Thirdly: we identify that state-of-the-art AutoML systems only focus on model selection and hyperparameter tuning while leaving room for improvement on the data preprocessing. To tackle the challenges above, several contributions are made in the preliminary work, and future work is proposed to conclude the dissertation: • The first contribution of this research dissertation is the development of standards for AutoML, which generalize components that have been used for a while and give them proper definitions. • Second, we propose a better methodology for the evaluation of AutoML systems that provide a better understanding of the capabilities of different systems • To alleviate the burden of human efforts to create single-use AutoML systems, we propose an extendible to enable AutoML. The proposed AutoML frameworks enable customizable AutoML solutions without designing and implementing every aspect of an AutoML system. • Considering the potential benefit of data preprocessing search for AutoML in the last piece of work, we focus on the preprocessing search by creating an end-to-end framework that takes advantage on contextual feature similarities.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectMachine Learning
dc.subjectAutomated Machine Learning
dc.subjectData Transformation
dc.titleAutomated Machine Learning Systems: Evaluation, Ease of Use, Data Transformation
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberShipman, Frank
dc.contributor.committeeMemberWang, Zhangyang
dc.contributor.committeeMemberShen, Yang
dc.type.materialtext
dc.date.updated2023-02-07T16:06:44Z
local.etdauthor.orcid0000-0002-5212-6030


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record