INVESTIGATING THE IMPACT OF ROADWAY GEOMETRY, SPEED DISTRIBUTION, AND WEATHER CONDITION ON ROADWAY DAILY CRASH OCCURRENCE AND SEVERITY BY USING MACHINE LEARNING METHODS
Abstract
Conventional traffic crash analysis methods often use highly aggregated data, making it difficult to understand the effects of many time-varying factors on crash occurrence. Although studies have used data with small aggregation intervals, they typically analyze the effect of a single factor on crash occurrence. In this study, the collaborative effect of roadway geometry, speed distribution, and weather conditions on crash occurrence and severity is investigated using an interpretable or explainable machine learning method XGBoost (eXtreme Gradient Boosting) on daily level crash data. The data are collected from four different sources on roadways in Texas. Three roadway facility types are considered in this study: (1) Rural Interstate; (2) Rural Two-Lane; (3) Rural Multilane. In the feature selection process, the Pearson correlation coefficient is applied to remove highly correlated variables. The study then uses the synthetic minority over-sampling technique (SMOTE) method to mitigate the data imbalance issue. The XGBoost model is trained twice: first on data with all crash severity levels, and then only on data with fatal and severe injury crash levels. Finally, the SHAP (SHapley Additive exPlanation) method is applied to investigate the contribution of all variables on the model’s output. The results show that on different roadways facility types the contributions of variables tend to be different, and moreover, the variables also contribute differently on crashes with different severity levels.
Citation
Wei, Zihang (2021). INVESTIGATING THE IMPACT OF ROADWAY GEOMETRY, SPEED DISTRIBUTION, AND WEATHER CONDITION ON ROADWAY DAILY CRASH OCCURRENCE AND SEVERITY BY USING MACHINE LEARNING METHODS. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /195342.