Structured Sparsity Learning for Coevolution-Based Protein Contact Prediction

Wu, Di

View/ Open

WU-THESIS-2018.pdf (2.375Mb)

Date

2018-11-30

Author

Wu, Di

Metadata

Show full item record

Abstract

Residue coevolution refers to a biological assumption that residue pairs covary during evolution if they form a contact within a protein or across a protein-protein interface. Under this assumption, such covariance can be used to predict residue contacts within or between protein sequences. The increasing availability of protein sequence data allows for wider applicability and also demand more accurate approaches. Current methods are modeling sequence data in Markov random fields and use maximum likelihood estimations to infer residue contacts. They mainly target the accuracy of contact prediction under the promise that more accurate 2D contact prediction helps to get a better 3D structure. This is correct but not the whole picture since patterns of predicted 2D contacts also play a significant impact on 3D structure reconstruction. For example, contacts between long-distance residue pairs in general help more than adjacent residue pairs do. Moreover, current methods always get predictions that focus on certain area. To directly target 3D structure predictions, we introduce a new method which exploits more types of data, such as secondary structure data and folds type information, to characterize the desired sparsity patterns of contact prediction in a biologically meaningful way. It then uses multiple structured sparsity regularization models, including group LASSO and group dispersive sparsity, to enforce such sparsity patterns. This method benefits from the consideration and promotion of structured sparsity, which contributes to improvement of 3D structure prediction.

Citation

Wu, Di (2018). Structured Sparsity Learning for Coevolution-Based Protein Contact Prediction. Master's thesis, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /174625.