An Automated Framework To Generate End-to-End Machine Learning Pipelines
Abstract
The recent developments in machine learning have shown its applicability in numerous real-world applications. However, building an optimal machine learning pipeline requires considerable knowledge and experience in data science. To address this problem, many automated machine learning (AutoML) frameworks have been proposed. However, most of the existing AutoML frameworks treat the pipeline generation as a black-box optimization problem. Thus, failing to incorporate basic heuristics and human intuition. Furthermore, most of these frameworks provide very basic or no feature engineering abilities. To tackle these challenges, in this thesis, we propose an automated framework to generate end-to-end machine learning pipelines. By survey of 100s of Kaggle kernels and extensive experimentation, we finalized a set of heuristics which enhances the pipeline optimization problem. We also implemented a system to automate feature engineering, which could generate 100s of features to produce better representation of the data.
Additionally, the framework provides interpretations about why certain models and features were selected by the system. This would help the users to further improve the pipeline. Finally, our experimentation shows consistent performance across various datasets.
Citation
Kapale, Anurag (2019). An Automated Framework To Generate End-to-End Machine Learning Pipelines. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /188756.