Show simple item record

dc.contributor.advisorZhang, Xianyang
dc.creatorYan, Jian
dc.date.accessioned2023-10-12T15:05:39Z
dc.date.created2023-08
dc.date.issued2023-08-11
dc.date.submittedAugust 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/200109
dc.description.abstractRecently, distance and kernel-based metrics have gained increasing popularity in both the statistics and machine learning communities, leading to the development of a general class of nonparametric tests. This dissertation includes two projects related to distance and kernel-based nonparametric two-sample tests. Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, the first project studies the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform asymptotically exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected by the kernel two-sample tests and the dimension-and-sample orders. The asymptotic theory is further corroborated through numerical studies. The second project concerns testing for the equality of two conditional distributions, which is critical in numerous modern applications such as transfer learning and program evaluation. However, this fundamental problem has surprisingly received little attention in the literature. The primary objective is to establish a distance and kernel-based framework for two-sample conditional distribution testing that is adaptable to multivariate distributions and allows for heterogeneity in the marginal distributions. We propose two metrics, the conditional generalized energy distance and the conditional maximum mean discrepancy, which completely characterize the homogeneity of two conditional distributions. Utilizing these metrics, we develop both local and global tests that can identify local and global discrepancies between two conditional distributions. To approximate the finite-sample distributions of the test statistics, we employ a novel local bootstrap procedure. Our proposed local and global two-sample conditional distribution tests demonstrate reliable performance through simulations and a real data analysis.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectConditional distribution
dc.subjectEnergy distance
dc.subjectHigh dimensionality
dc.subjectMaximum mean discrepancy
dc.subjectTwo-sample testing
dc.titleDistance and Kernel-Based Nonparametric Two-Sample Tests
dc.typeThesis
thesis.degree.departmentStatistics
thesis.degree.disciplineStatistics
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberGaynanova, Irina
dc.contributor.committeeMemberWong, Raymond Ka Wai
dc.contributor.committeeMemberTuo, Rui
dc.type.materialtext
dc.date.updated2023-10-12T15:05:47Z
local.embargo.terms2025-08-01
local.embargo.lift2025-08-01
local.etdauthor.orcid0009-0009-8738-0635


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record