Assistant Professor Iowa State University Ames, Iowa, United States
Introduction:: Short (< 15 amino acid) protein domains (peptides) can be used as therapeutic peptides in a variety of ways. These peptides can mimic the activity of the full-length protein, act as antagonists to block the activity of the full-length protein, or act as modulators to alter the activity of the full-length protein. Short peptides have improved stability, reduced immunogenicity, and ease of production and storage. Peptides have thus emerged as drug candidates for surveillance and intervention of a variety of diseases. The global peptide therapeutics market size was USD 39.3 billion in 2021 and is expected to expand at a compound annual growth rate (CAGR) of 6.4% from 2022 to 2030. The rising incidence of flu-type viral diseases (SARS-CoV 2, avian flu), cancer as well as metabolic disorders such as osteoporosis, obesity, and diabetes are steering the prominence of peptide therapeutics in the forecast period. Also, due to the sheer increase in pediatric patients in low-income countries, there is a pressing demand for efficient and low-cost drugs. We built experimentally validated and informed, technology for tomorrow –SUPERLEGO (Swift Utility Program to Engage Receptor Ligand via Experiments and Geometric Optimization) for (a) rationally piecing out functional fragments (domains) from larger proteins, (b) optimally adding the minimum number of select amino acids to both N-terminus and C-terminus flank such that the secondary structure of the domain remains intact, and (c) affinity maturating the resultant sequence by predicting few point mutations, to engineer highly-human, multivalent, specific binders for one or more target proteins.
Materials and Methods:: AlphaFold2 deep learning protein structure prediction tool two years back prompted an avalanche of similar machine learning tools and diffusion models have been released for associated problems – such as understanding protein-protein interactions and designing from scratch, novel (non-natural) synthetic peptides that can bind to a target protein with extremely high affinity. While these models and studies are getting published in high-impact factor journals, there is little-to-no interest from leading biotech/ pharmaceutical companies to use these ML-predicted novel peptides. To this end, introduced a new large-language-based geometric deep-learning model (RGN2 –) to predict protein (and mutant) structures accurately from single sequences with < 2% computing resources and a million-fold speed gain over state-of-the-art tool AlphaFold2. This equips his group to computationally confirm if structures of different peptide fragments can remain stable isolation after being pieced out from the larger LRP (decorin). Additionally, RGN2 will be instrumental in assessing integrity/ stability of mutant peptide sequences as well. It will involves an integer optimization algorithm for identifying the right fragment size and sequence for binding the intended target. Development and implementation of similar integer optimization algorithms (albeit without integration with RGN2) for engineering natural protein pores (for separation applications - PoreDesigner), and enzymes (for altered biocatlytic activity – IPRO+/-) have already been demonstrated by Chowdhury. PyRosetta-based protein-protein docking and binding strength estimation between two proteins (and peptides) have also been shown in previous work of Chowdhury – in understanding disease entry mechanism of and therapeutic peptide design for intervention of SARS CoV-2.
Results, Conclusions, and Discussions:: Owing to the high similarity of human (leucine-rich proteins) LRPs with other mammalian LRPs and strangely fungal LRPs, we created an expanded structure-function database of 30k unique leucine-rich fragment (NOT full proteins) spanning different spheres of life. We utilize this in-house LRP database (iLDB) (to drive the exploration of different sequences (i.e., defines the combinatorically permissive sequence space) while designing better therapeutic peptides without losing any humanness of the peptide.
Decorin has 13 leucine-rich domains that participate in different binding activities with different target proteins in the human body. For this endeavor we will focus on piecing out domain-VI (six) out of human decorin, stabilizing it by adding minimal flanks to either end, and demonstrate that the isolated domain will bind to its native binder EGFR (epidermal growth factor) protein with higher affinity than when present with decorin protein.
To test, the contribution of an adjacent strand around the domain of interest, domainVI, we created all possible mono-domain, bi-domain, and tri-domain peptides and repeated the docking with EGFR to verify if there is indeed any stabilizing effect of adjacent strand which enhances binding affinity. Our preliminary data is very promising (albeit under the assumption that each domain remains perfectly stable without any sequence optimization) and recapitulates known experimental data that domain-VI has the highest affinity towards EGFR.
We constructed a library of possible flanks which when appended to the tri-domain decorin fragment, are likely to stabilize the domain itself and enhance binding with EGFR. Top designs bound to EGFR were sent through molecular dynamics simulations in the presence of explicit solvent (Nosé-Hoover-Langevin dynamics) to evaluate kinetic rate of binding (i.e., temporal fluctuation in affinity when bound to EGFR, and the temperature, pH at which this complex destabilizes and falls apart). Most stable designs were then be sent for experimental synthesis and incorporation in nanoparticle and EGFR binding affinity assays and corroborated.
We thus glean unified rules for minimal domain-design from existing human proteins and enhancement of avidity of binding.