Automated Machine Learning Systems: Evaluation, Ease of Use, Data Transformation

Martinez Garcia, Diego Serafin

dc.contributor.advisor	Hu, Xia
dc.creator	Martinez Garcia, Diego Serafin
dc.date.accessioned	2023-02-07T16:06:38Z
dc.date.available	2023-02-07T16:06:38Z
dc.date.created	2022-05
dc.date.issued	2022-03-22
dc.date.submitted	May 2022
dc.identifier.uri	https://hdl.handle.net/1969.1/197162
dc.description.abstract	Machine Learning (ML) has been rapidly progressing through the years due to its versatility to solve different problems. As a result, many machine learning frameworks have been created that are a collection of different algorithms. However, data scientists often struggle to determine which one to use to develop their ML solutions due to many options. For this reason, many Automated Machine Learning (AutoML) systems have been created to help users create ML solutions easily by defining the problem they want to solve, providing the data, and setting a budget such as time search for a solution. Through the years, many AutoML systems have been developed, and their applications range from solving simple tasks such as tabular classification to more complex such as object detection. Due to many AutoML systems, it becomes challenging for users to determine which one suits them the best because most of the systems focus on specific tasks and data, and sometimes they overlap on the tasks they can solve. Another issue that users need to be aware of is that although most of the search process is already automated, it is necessary for the user to get involved in data preprocessing in most systems. Such preprocessing can be trivial, from selecting images files to more challenging tasks such as merging multiple database tables. Another aspect of AutoML systems is the research involved in the development. AutoML research can focus on the way of searching, to some more deep processes such as optimizing how models are run. This research leads to the creation of many AutoML systems every year. Creating an AutoML system is challenging since there are many things to consider, from the design to the implementation, and attempting to use an existing system to test a new hypothesis becomes challenging. The challenge of reusing an existing AutoML system is that most systems were designed towards proposing some research improvement rather than usability. Another problem is that these systems are not maintained, and if they are maintained, it is difficult to use them due to the lack of documentation. Enabling the advance on AutoML is challenging. Firstly, it is necessary to standardize compo-nents, so there is a more efficient way to compare different frameworks. There are many AutoML systems every year with new search strategies. However, it becomes challenging to objectively compare them since they could be improving in other areas rather than the ones claimed. Secondly, creating an AutoML system should not be challenging since it can stop many researchers from contributing to the field. Creating a new AutoML system with the sole purpose of testing a new component should not be difficult. Thirdly: we identify that state-of-the-art AutoML systems only focus on model selection and hyperparameter tuning while leaving room for improvement on the data preprocessing. To tackle the challenges above, several contributions are made in the preliminary work, and future work is proposed to conclude the dissertation: • The first contribution of this research dissertation is the development of standards for AutoML, which generalize components that have been used for a while and give them proper definitions. • Second, we propose a better methodology for the evaluation of AutoML systems that provide a better understanding of the capabilities of different systems • To alleviate the burden of human efforts to create single-use AutoML systems, we propose an extendible to enable AutoML. The proposed AutoML frameworks enable customizable AutoML solutions without designing and implementing every aspect of an AutoML system. • Considering the potential benefit of data preprocessing search for AutoML in the last piece of work, we focus on the preprocessing search by creating an end-to-end framework that takes advantage on contextual feature similarities.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Machine Learning
dc.subject	Automated Machine Learning
dc.subject	Data Transformation
dc.title	Automated Machine Learning Systems: Evaluation, Ease of Use, Data Transformation
dc.type	Thesis
thesis.degree.department	Computer Science and Engineering
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Texas A&M University
thesis.degree.name	Doctor of Philosophy
thesis.degree.level	Doctoral
dc.contributor.committeeMember	Shipman, Frank
dc.contributor.committeeMember	Wang, Zhangyang
dc.contributor.committeeMember	Shen, Yang
dc.type.material	text
dc.date.updated	2023-02-07T16:06:44Z
local.etdauthor.orcid	0000-0002-5212-6030

Files in this item

Name:: MARTINEZGARCIA-DISSERTATION-20 ...
Size:: 996Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record