Recently ab initio protein folding using predicted contacts as restraints has made some progress, but it requires accurate contact prediction, which by existing methods can only be achieved on some large-sized protein families. To deal with small-sized protein families, we employ the powerful deep learning technique from Computer Science, which can learn complex patterns from large datasets and has revolutionized object and speech recognition and the GO game. Our deep learning model for contact prediction is formed by two deep residual neural networks. The first one learns relationship between contacts and sequential features from protein databases, while the second one models contact occurring patterns and their relationship with pairwise features such as contact potential, residue co-evolution strength and the output of the first network. Experimental results suggest that our deep learning method greatly improves contact prediction and contact-assisted folding. Tested on 579 proteins dissimilar to training proteins, the average top L (L is sequence length) long-range prediction accuracy of our method, the representative evolutionary coupling method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; their average top L/10 long-range accuracy is 0.77, 0.47 and 0.59, respectively. Using our predicted contacts we can correctly fold 203 test proteins, while MetaPSICOV and CCMpred can do so for only 79 and 62 proteins, respectively. In the three weeks of blind test with the weekly benchmark CAMEO (http://www.cameo3d.org/), our method successfully folded three hard targets with a new fold and only 1.5L-2.5L sequence homologs while all template-based methods failed. Our RaptorX-Contact was officially ranked 1st in contact prediction (http://www.predictioncenter.org/casp12/rrc_avrg_results.cgi) in the worldwide protein structure prediction (CASP12).
Sheng Wang, Toyota Technological Institute at Chicago