An Analysis of Workload Patterns In Borg Cloud Cluster Traces
Abstract
The drastic emergence of dynamic, heterogeneous, and shared cloud computing clusters has impacted corporations, researchers, and software developers for the past decade. With the aim to improve further the field of cloud data processing and management, it is essential to understand the workload characteristics of large-scale cloud data centers. We analyze the publicly available trace data released by Google in 2019, deriving an approach that can be used on other large-scale trace datasets from the industry, such as the recently released data from Microsoft Azure and Alibaba. The notable workload characteristics of heterogeneity in resource types and usage in the Google traces suggest a highly dynamic environment, with varying jobs that demand faster and more scalable scheduling decisions. In the Google Borg trace dataset, comparing the overall usage of the cluster to its capacity, the average utilization of the cluster for each tier is overcommitted and demonstrates a frequent regulation of preemption to achieve its high utilization. Using K-means clustering and Lasso regression, we gain valuable insights into the characteristics of Borg’s scheduling patterns. Moreover, we perform the reverse-engineering technique with supervised classification methods to understand the key factors being considered in the latest prediction algorithm for resource demands. Since scheduling and utilization datasets are tremendously large, we manually sample them down to an appropriate size. Although the sample size prevents us from generalizing overall trace behaviors, the analytical method we describe nonetheless extracts system design insights that can be useful in scheduling decision-making for large-scale clusters.
Subject
GoogleBorg
Cloud Compute Engine
Data Science
K-means Clustering
Random Forest
Lasso Regression
XGBoosting Classifier
Citation
Kang, Zengxiaoran (2022). An Analysis of Workload Patterns In Borg Cloud Cluster Traces. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /196508.