Bioinformatics, Computational and Systems Biology
Shruthi Garimella, BS (she/her/hers)
Graduate Student Researcher
University of California, Davis
Davis, California, United States
Priya Shah, PhD
Assistant Professor
University of California, Davis
Davis, California, United States
As the world continues to urbanize and human beings push into historically undisturbed areas, the increased interactions with wild animal species increase the chances of being infected with a new disease. In the last 20 years, three epidemics/pandemics have been caused by coronaviruses, cementing them as a public health concern whose risk for emergence must be assessed. These viruses infected humans after spilling over from an animal reservoir and continuing to transmit across humans. The viruses can infect humans because of surface proteins on the virus that bind with a human receptor. This initial protein interaction between the viral surface protein and human host receptor that initiates the infection process is a key interaction that can assess the likelihood of other coronaviruses infecting humans.
Coronaviruses bind to their targets via the Spike protein This project focuses on sarbecoviruses since some are known to bind to the ACE2 receptor, like SARS-CoV-2. Studying this interaction across different sarbecoviruses can identify additional ACE2-binding sarbecoviruses and coronaviruses with spillover potential. The interaction with ACE2 occurs at the receptor binding domain (RBD) of the S1 region of Spike. Through computational approaches, predicting and analyzing the interaction beyond RBD can reveal new binding modalities and components that may improve binding to ACE2 and its chances for spillover. The goal of this project is to study interaction between different sarbecoviruses S1s with ACE2 through computational methods to predict which sarbecoviruses have the greatest potential bind to ACE2 and spillover.
Sarbecovirus S1s were determined from literature, based on predicted phylogeny of coronaviruses from sequence alignments. The sequences were identified through genome databases, NCBI and COV3D. Known pandemic-causing coronaviruses were used as controls. The sarbecoviruses, SARS-CoV and SARS-CoV-2, were used as positive controls since they bind to ACE2. MERS-CoV was used as the negative control since it targets another receptor, DPP4.
AlphaFold2 was used to predict the structures from the inputted sequences using machine learning to compare the S1 sequence to experimentally resolved structures and sequence alignments to produce a protein structure. To model the interaction with ACE2, the predicted structures are inputted to ClusPro, a docking program that predicts structure and binding of protein complexes. The program tests conformations of proteins based on energy minimization, van der Waals interactions, electrostatic forces, and hydrogen bonding to evaluate the strength and stability of each predicted complex, and reveals potential bonds and contacting residues.
Contact maps were used to visualize the distribution of predicted contacting residues and hydrogen bonds between the binding interface of the proteins. These contacts were also visualized using ChimeraX which highlights the locations of the residues and bonds based on proximity.
Sequence and structural alignments were used to identify which of interacting contacts are conserved across the set of sarbecoviruses in comparison to the positive and negative controls. A conserved residue indicates that a predicted interaction is true. Alignments were also used to determine the location of where S1 binds to ACE2, in case the position varies across viruses.
Preliminary work was done focusing on the RBD region of the controls. Initial predictions between the positive controls showed 8 conserved hydrogen bonds across the RBD, while the negative control was predicted to interact with ACE2 at a non-binding interface. There is a ~80% sequence similarity between the positive controls and a ~20% sequence similarity with the negative control across the RBD. SARS-CoV-2 is known to have a higher affinity to ACE2 so studying the entire S1 region can explain what other mechanisms are involved in the interaction.
These controls are used as thresholds to quantify a predicted high-affinity interaction. The culmination of sequences, protein structures, and docking interactions can predict the potential that a virus can bind the human receptor. Their likelihood to bind to ACE2 is determined through a summation of different binding strength prediction parameters and similarity metrics such as number of bonds, number of predicted contacts, and percent sequence similarity. Expanding the scope of interest to S1 ensures that all binding mechanisms and structure components are being considered to get a better understanding of what is responsible for high-affinity interactions in sarbecoviruses.
The summation of parameters can be iterated such that it becomes a linear combination of weighted parameters that assess the most influential parameters associated with binding affinity prediction. This distribution can provide some insight into binding mechanisms and stability. The next step of this work is to use the results to determine a sample set of predicted high-affinity candidates to be tested experimentally alongside the controls.
The objective of this project is to identify high-affinity coronavirus Spike proteins to the human ACE2 receptor as a method of pandemic-risk assessment. The development of an interaction screening pipeline allows for the analysis of virus-host protein interaction affinity across a sample set of viruses and generation of data that leverages parameters of binding modality for assessing binding. Overall, this process produces a group of coronaviruses that have a predicted high affinity to binding to the ACE2 receptor and will be forwarded as candidates for further experimental research and their potential to cause an epidemic or pandemic.