Multivariate Analysis Applied to the California Health Interview Survey

Ross, Aaron

dc.creator	Ross, Aaron
dc.date.accessioned	2019-06-10T16:18:17Z
dc.date.available	2019-06-10T16:18:17Z
dc.date.created	2019-05
dc.date.submitted	May 2019
dc.identifier.uri	https://hdl.handle.net/1969.1/175467
dc.description.abstract	Objective: Identify if principle components analysis and multiple correspondence analysis are suitable dimension reduction techniques for the California Health Interview Survey. Identify which health risk behaviors, mental health and demographic factors cluster utilizing k-medians clustering. Background: Clustering and multivariate analysis techniques can be used to characterize populations and sub-populations of people by grouping them based on an individual’s similarity to others. These exploratory techniques, while uniformly accepted within the scientific community as valid, are not as popular as other statistical methods and have not been utilized in certain scenarios where they could potentially be useful. The UCLA Center for Health Policy Research’s annual California Health Interview Survey (CHIS) dataset is one such example where using these multivariate techniques could provide new insight. The survey contains information on thousands of randomly sampled Californians regarding health, income and demographics, among other factors. This research project attempts to determine if principle components analysis and multiple correspondence analysis are suitable dimension reduction techniques when applied to the CHIS dataset and to quantify and qualify in greater detail the differences and similarities between the health characteristics of California residents. Methods: This study used data from 21,055 individuals interviewed via telephone from the 2016 California Health Interview Survey, the largest state-wide health survey in the U.S. The statistical procedures principle components analysis and multiple correspondence analysis were conducted to assess their usefulness when applied to health survey data. Concurrently, Gower k-medians clustering was used to identify distinct groupings of California residents. I then performed a chi-squared test to determine which variables are the most statistically significant in forming these clusters. Results: Principle components analysis reduced the initial 118 variables considered to 30, with the largest component only explaining 10.44% of the total variation in the data, suggesting that this technique is ill-suited to the CHIS. Multiple correspondence analysis, however, reduced the 88 categorical variables to 5 with the largest component accounting for 62.27% of the variation in the data. By applying Gower k-medians, I produced 3 distinct clusters of survey respondents and determined that access to specialized medical care is the most strongly clustered characteristic.	en
dc.format.mimetype	application/pdf
dc.subject	Multivariate Statistics	en
dc.subject	Public Health	en
dc.subject	Survey	en
dc.subject	Clustering	en
dc.subject	PCA	en
dc.subject	MCA	en
dc.subject	Health Economics	en
dc.title	Multivariate Analysis Applied to the California Health Interview Survey	en
dc.type	Thesis	en
thesis.degree.department	Economics	en
thesis.degree.discipline	Economics	en
thesis.degree.grantor	Undergraduate Research Scholars Program	en
thesis.degree.name	BS	en
thesis.degree.level	Undergraduate	en
dc.contributor.committeeMember	Jansen, Dennis
dc.type.material	text	en
dc.date.updated	2019-06-10T16:18:17Z

Files in this item

Name:: ROSS-FINALTHESIS-2019.pdf
Size:: 393.5Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Undergraduate Research Scholars Capstone (2006–present)

Show simple item record