Show simple item record

dc.creatorRoss, Aaron
dc.date.accessioned2019-06-10T16:18:17Z
dc.date.available2019-06-10T16:18:17Z
dc.date.created2019-05
dc.date.submittedMay 2019
dc.identifier.urihttp://hdl.handle.net/1969.1/175467
dc.description.abstractObjective: Identify if principle components analysis and multiple correspondence analysis are suitable dimension reduction techniques for the California Health Interview Survey. Identify which health risk behaviors, mental health and demographic factors cluster utilizing k-medians clustering. Background: Clustering and multivariate analysis techniques can be used to characterize populations and sub-populations of people by grouping them based on an individual’s similarity to others. These exploratory techniques, while uniformly accepted within the scientific community as valid, are not as popular as other statistical methods and have not been utilized in certain scenarios where they could potentially be useful. The UCLA Center for Health Policy Research’s annual California Health Interview Survey (CHIS) dataset is one such example where using these multivariate techniques could provide new insight. The survey contains information on thousands of randomly sampled Californians regarding health, income and demographics, among other factors. This research project attempts to determine if principle components analysis and multiple correspondence analysis are suitable dimension reduction techniques when applied to the CHIS dataset and to quantify and qualify in greater detail the differences and similarities between the health characteristics of California residents. Methods: This study used data from 21,055 individuals interviewed via telephone from the 2016 California Health Interview Survey, the largest state-wide health survey in the U.S. The statistical procedures principle components analysis and multiple correspondence analysis were conducted to assess their usefulness when applied to health survey data. Concurrently, Gower k-medians clustering was used to identify distinct groupings of California residents. I then performed a chi-squared test to determine which variables are the most statistically significant in forming these clusters. Results: Principle components analysis reduced the initial 118 variables considered to 30, with the largest component only explaining 10.44% of the total variation in the data, suggesting that this technique is ill-suited to the CHIS. Multiple correspondence analysis, however, reduced the 88 categorical variables to 5 with the largest component accounting for 62.27% of the variation in the data. By applying Gower k-medians, I produced 3 distinct clusters of survey respondents and determined that access to specialized medical care is the most strongly clustered characteristic.en
dc.format.mimetypeapplication/pdf
dc.subjectMultivariate Statisticsen
dc.subjectPublic Healthen
dc.subjectSurveyen
dc.subjectClusteringen
dc.subjectPCAen
dc.subjectMCAen
dc.subjectHealth Economicsen
dc.titleMultivariate Analysis Applied to the California Health Interview Surveyen
dc.typeThesisen
thesis.degree.departmentEconomicsen
thesis.degree.disciplineEconomicsen
thesis.degree.grantorUndergraduate Research Scholars Programen
thesis.degree.nameBSen
thesis.degree.levelUndergraduateen
dc.contributor.committeeMemberJansen, Dennis
dc.type.materialtexten
dc.date.updated2019-06-10T16:18:17Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record