Show simple item record

dc.contributor.advisorShen, Yang
dc.creatorCao, Yue
dc.date.accessioned2022-07-27T16:43:46Z
dc.date.available2023-12-01T09:22:06Z
dc.date.created2021-12
dc.date.issued2021-12-08
dc.date.submittedDecember 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/196384
dc.description.abstractProteins are the workhorse molecules of lives. Understanding how proteins function is one of the most fundamental problems in molecular biology, which can drive a plethora of biological and pharmaceutical applications. However, the experimental determination of protein mechanisms is expensive and time-consuming. Such a gap motivates developing computational methods for protein science. The goal of this thesis is to investigate to what extent machine learning can uncover the underlying mechanisms of proteins. We concentrate on two primitives: predicting the 3D structures of protein--protein interactions (called protein docking) and understanding the protein sequence--function relationships. Accordingly, we organize the thesis as follows: First, we study protein docking. We introduce Bayesian Active Learning (BAL), the first optimization algorithm with uncertainty quantification (UQ) for protein docking. Extensive experiments demonstrated the superior performance of BAL against competitors on both optimization and UQ. In addition, we generalize BAL into the realm of meta-learning and propose LOIS: Learning to Optimize in Swarms. LOIS outperforms various optimization algorithms for general optimization tasks. Finally, we focus on the scoring problem in protein docking and introduce Energy‐based Graph Convolutional Networks (EGCN) that directly learns energies from graph representations of docking models, which performed better than competitors. Second, we focus on understanding the protein sequence--function relationship. We first study the forward protein function prediction and introduce TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding. Combining TALE and a sequence similarity-based method, TALE+ outperformed competing methods when only sequence input is available. We also study the inverse design and describe our novel conditional autoregressive deep generative models. By learning the functional embeddings from Gene Ontology (GO) graph as conditional inputs, our conditional autoregressive models were able to model the distributions of protein sequences for given functions.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectprotein
dc.subjectmachine learning
dc.subjectprotein docking
dc.subjectoptimization
dc.subjectuncertainty quantification
dc.subjectprotein function prediction
dc.subjectprotein design
dc.subjectdeep learning
dc.titleOptimization, Learning and Generation for Proteins: Docking Structures and Mapping Sequence–Function Relationships
dc.typeThesis
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical Engineering
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberQian, Xiaoning
dc.contributor.committeeMemberTuo, Rui
dc.contributor.committeeMemberNarayanan, Krishna
dc.type.materialtext
dc.date.updated2022-07-27T16:43:47Z
local.embargo.terms2023-12-01
local.etdauthor.orcid0000-0002-9941-6297


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record