Show simple item record

dc.creatorKang, Zengxiaoran
dc.date.accessioned2022-08-09T16:05:28Z
dc.date.available2022-08-09T16:05:28Z
dc.date.created2022-05
dc.date.submittedMay 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/196508
dc.description.abstractThe drastic emergence of dynamic, heterogeneous, and shared cloud computing clusters has impacted corporations, researchers, and software developers for the past decade. With the aim to improve further the field of cloud data processing and management, it is essential to understand the workload characteristics of large-scale cloud data centers. We analyze the publicly available trace data released by Google in 2019, deriving an approach that can be used on other large-scale trace datasets from the industry, such as the recently released data from Microsoft Azure and Alibaba. The notable workload characteristics of heterogeneity in resource types and usage in the Google traces suggest a highly dynamic environment, with varying jobs that demand faster and more scalable scheduling decisions. In the Google Borg trace dataset, comparing the overall usage of the cluster to its capacity, the average utilization of the cluster for each tier is overcommitted and demonstrates a frequent regulation of preemption to achieve its high utilization. Using K-means clustering and Lasso regression, we gain valuable insights into the characteristics of Borg’s scheduling patterns. Moreover, we perform the reverse-engineering technique with supervised classification methods to understand the key factors being considered in the latest prediction algorithm for resource demands. Since scheduling and utilization datasets are tremendously large, we manually sample them down to an appropriate size. Although the sample size prevents us from generalizing overall trace behaviors, the analytical method we describe nonetheless extracts system design insights that can be useful in scheduling decision-making for large-scale clusters.
dc.format.mimetypeapplication/pdf
dc.subjectGoogle
dc.subjectBorg
dc.subjectCloud Compute Engine
dc.subjectData Science
dc.subjectK-means Clustering
dc.subjectRandom Forest
dc.subjectLasso Regression
dc.subjectXGBoosting Classifier
dc.titleAn Analysis of Workload Patterns In Borg Cloud Cluster Traces
dc.typeThesis
thesis.degree.departmentComputer Science & Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorUndergraduate Research Scholars Program
thesis.degree.nameB.S.
thesis.degree.levelUndergraduate
dc.contributor.committeeMemberDa Silva, Dilma
dc.type.materialtext
dc.date.updated2022-08-09T16:05:29Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record