Show simple item record

dc.contributor.advisorNi, Yang
dc.creatorChoi, Junsouk
dc.date.accessioned2023-10-12T14:23:57Z
dc.date.created2023-08
dc.date.issued2023-07-30
dc.date.submittedAugust 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/199967
dc.description.abstractObservational zero-inflated count data arise in a wide range of areas such as economics, social sciences, and biology. One of the common research questions in these areas is to identify causal relationships by learning the structure of a sparse directed acyclic graph (DAG). While causal structure learning of DAGs has been an active research area, existing methods do not adequately account for excessive zeros and therefore are not suitable for modeling zero-inflated count data. In this dissertation, we propose three novel causal discovery approaches for observational zero-inflated count data. First, we propose a new zero-inflated Poisson Bayesian network (ZIPBN) model to infer causal relationships in observational zero-inflated count data. We show that the proposed ZIPBN is uniquely identifiable from observational data. For causal structural learning, we introduce a fully Bayesian inference approach that exploits a parallel-tempered Markov chain Monte Carlo (MCMC) algorithm to efficiently explore the multi-modal network space. Through simulations, we compare the proposed ZIPBN with alternative causal Bayesian network approaches, thereby demonstrating its utility in causal discovery for zero-inflated count data. Additionally, real single-cell RNA-sequencing (scRNA-seq) data with known causal relationships are used to assess the capability of ZIPBNs for discovering causal relationships in real-world problems. Second, by extending ZIPBNs, we develop a more general class of causal models, referred to as a zero-inflated generalized hypergeometric DAG (ZiG-DAG) model, which facilitates causal discovery in various types of observational zero-inflated count data. The proposed ZiG-DAGs leverage the broad family of generalized hypergeometric probability distributions to effectively account for diverse features of observational zero-inflated count data, including overdispersion. Additionally, ZiG-DAGs incorporate both linear and nonlinear causal relationships to provide a comprehensive representation of real-world causality. For the proposed ZiG-DAGs, we establish identifiability theories, proving that their causal structures are fully identifiable using a novel proof technique that has the potential to extend beyond our specific models. We develop score-based algorithms for causal structure learning and demonstrate the superior performance of ZiG-DAGs against state-of-the-art alternative methods through extensive simulations and real data analysis with known causal relationships. We also illustrate the practical utility of the proposed ZiG-DAGs through an application of reverse-engineering of a gene regulatory network using a scRNA-seq dataset. Third, we propose a novel Bayesian differential zero-inflated negative binomial DAG (DAG0) model to identify differential causal relationships as well as explicitly account for zero-inflation in observational zero-inflated count data. In this work, we consider two-sample zero-inflated count data collected from two experimental groups (control vs treatment), with the primary objective of identifying differences in causal structures across the experimental groups. The proposed DAG0 model allows for simultaneous inference of both differential causal directions and differential causal strengths. We develop a Bayesian inference method paired with the parallel-tempered MCMC and show the utility of the proposed DAG0 by comparing it with state-of-the-art alternatives through extensive simulations. Furthermore, we apply our DAG0 to a two-sample scRNA-seq dataset generated from two experimental groups. Our analysis reveals interesting findings that align with existing knowledge, further highlighting the utility of the proposed DAG0.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectBayesian network
dc.subjectcausal identifiability
dc.subjectdifferential network
dc.subjectdirected acyclic graph
dc.subjectsingle-cell RNA-sequencing
dc.subjectzero-inflated model
dc.titleCausal Discovery for Observational Zero-Inflated Count Data
dc.typeThesis
thesis.degree.departmentStatistics
thesis.degree.disciplineStatistics
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberMallick, Bani K.
dc.contributor.committeeMemberGaynanova, Irina
dc.contributor.committeeMemberYoon, Byung-Jun
dc.type.materialtext
dc.date.updated2023-10-12T14:23:58Z
local.embargo.terms2025-08-01
local.embargo.lift2025-08-01
local.etdauthor.orcid0009-0009-9822-4144


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record