Structured Sparsity Learning for Coevolution-Based Protein Contact Prediction
Abstract
Residue coevolution refers to a biological assumption that residue pairs covary during evolution
if they form a contact within a protein or across a protein-protein interface. Under this assumption,
such covariance can be used to predict residue contacts within or between protein sequences. The
increasing availability of protein sequence data allows for wider applicability and also demand
more accurate approaches.
Current methods are modeling sequence data in Markov random fields and use maximum likelihood
estimations to infer residue contacts. They mainly target the accuracy of contact prediction
under the promise that more accurate 2D contact prediction helps to get a better 3D structure. This
is correct but not the whole picture since patterns of predicted 2D contacts also play a significant
impact on 3D structure reconstruction. For example, contacts between long-distance residue
pairs in general help more than adjacent residue pairs do. Moreover, current methods always get
predictions that focus on certain area.
To directly target 3D structure predictions, we introduce a new method which exploits more
types of data, such as secondary structure data and folds type information, to characterize the desired
sparsity patterns of contact prediction in a biologically meaningful way. It then uses multiple
structured sparsity regularization models, including group LASSO and group dispersive sparsity,
to enforce such sparsity patterns. This method benefits from the consideration and promotion of
structured sparsity, which contributes to improvement of 3D structure prediction.
Citation
Wu, Di (2018). Structured Sparsity Learning for Coevolution-Based Protein Contact Prediction. Master's thesis, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /174625.