DE NOVO PROTEIN DESIGN OF NOVEL FOLDS USING GUIDED CONDITIONAL WASSERSTEIN GENERATIVE ADVERSARIAL NETWORKS (GCWGAN)
Abstract
In the research areas about proteins, it is always a significant topic to detect the sequencestructure-function relationship. Fundamental questions remain for this topic: How much could current data alone reveal deep insights about such relationship? And how much could such insights enable inverse protein design, the design of protein sequences for desired structures or functions? In this project two novel generative models, the conditional Wasserstein GAN (cWGAN) and guided conditional Wasserstein GAN (gcWGAN), are developed to generate new sequences for a structure fold that is desired and novel. We first mapped the fold space into a low-dimensional Euclidean space in order for the fold representation. We also used a fast fold prediction method as the oracle and a feedback in gcWGAN. To train our models, we used a semi-supervised learning process where both sequences with and without paired structures are exploited for model training. For the results we got, we analyzed the relationship between the model efficiency and factors such as the oracle’s accuracy and data (sequence) availability; and we also found more diverse (and sometimes more novel) designs from gcWGAN compared to those from conditional VAE (variational autoencoder). These results reveal the value of current data in unraveling sequence structure relationship and inverse protein design.
Citation
Zhu, Shaowen (2019). DE NOVO PROTEIN DESIGN OF NOVEL FOLDS USING GUIDED CONDITIONAL WASSERSTEIN GENERATIVE ADVERSARIAL NETWORKS (GCWGAN). Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /186443.