Extensions of Regression Trees for Subgroup Identification
Abstract
Effective analysis is a key to the science of data analytics. Substantial advancement in data analytics and science has been made. Yet, there is still a rationale and validity for further research and more studies because the existing popular subgroup identification models, such as regression trees, are not effective in some cases. This dissertation is a serious endeavor to tackle those cases and devise better subgroup identification models.
Regression tree models have been widely used for subgroup identification in various domains such as social sciences, education, and healthcare informatics. However, a direct application of regression trees cannot satisfy the specific needs and may miss actually existing subgroups or identify misleading subgroups, because of challenging situations in practice. This dissertation focuses on modifying and extending regression trees for subgroup identification to address some uncharted situations, including i) developing correlation trees for cases where correlation, instead of regression, is of interest, ii) developing robust logistic regression trees to address outlier problems, and iii) exploring the potentials of generalized extreme value regression trees and Firth's logistic regression trees for modeling imbalanced class data.
This research is an interdisciplinary study on the interaction of advanced statistical modelling and machine learning approaches to identify heterogeneous subgroups to conquer the challenges in various fields and practices. The proposed models provide tangible insights, theories, and exploratory tools for subgroup identification. The research is expected to be widely applicable to various fields such as personalized medicine and optimal psychological interventions where subgroup analysis is the main concern. The potential impact of this research is intended for academia and industry and society in general.
Citation
Choi, Doowon (2021). Extensions of Regression Trees for Subgroup Identification. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /193126.