Statistical Analysis of Transposon Sequencing Data to Determine Essential Genes

De Jesus Aneiro, Michael A.

View/ Open

DEJESUSANEIRO-DISSERTATION-2016.pdf (3.816Mb)

Date

2016-12-02

Author

De Jesus Aneiro, Michael A.

Metadata

Show full item record

Abstract

Transposon Sequencing (TnSeq) has become a popular biological tool for assessing the phenotypes of large libraries of bacterial mutants at the same time. This allows for high-throughput identification of genes which are essential for growth, thus providing valuable information about the function of those genes and the discovery of potential drug targets that could lead to treatments. However, analysis of data obtained from TnSeq is challenging as it requires estimating unknown parameters from data that is often noisy and likely coming from a mixture of different phenotypes. In addition, the classification of essentiality is not known a priori, requiring unsupervised methods capable of identifying key features in the data to confidently determine essentiality. We present several models capable of identifying essential genes while overcoming the difficulties that are present in analyzing TnSeq data. Together, these methods provide ways to analyze TnSeq data in one or multiple conditions, confined within gene boundaries or across the entire genome, and while reducing the impact of noise and outliers that are often present in this type of data.

Citation

De Jesus Aneiro, Michael A. (2016). Statistical Analysis of Transposon Sequencing Data to Determine Essential Genes. Doctoral dissertation, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /159011.