Assistant Professor University of Michigan Ann Arbor, Michigan, United States
Introduction:: Genome-scale metabolic models predict how the genetic state of a bacterium determines its phenotype in a particular environment. Ideally, these models allow investigators to predict phenotypes from a bacterial genome sequence alone. In practice, however, models must be reconciled against experimental data to improve their predictions and account for biological processes like transcriptional regulation that cannot be predicted de novo from the genome.
Most reconciliation algorithms for genome-scale models use an optimization framework where runtimes increase with the size of the experimental data. Unfortunately, such algorithms cannot cope with the massive datasets produced by recent advances in laboratory automation. Here we present a scalable evolutionary framework for reconciling genome-scale models against high-throughput phenotypic data. Our approach “evolves” a model by changing the presence and absence of enzymes and scoring the resulting models against a compendium of growth data. Our algorithm shows excellent scaling with respect to the size of the metabolic model and the experimental data, allowing researchers to use massive phenotypic datasets to improve models of increasing complexity.
Materials and Methods:: We tested our evolutionary approach on a genome-scale metabolic model of the oral bacterium Streptococcus sanguinis. The model’s 584 enzymes catalyze 805 metabolic reactions that transform 606 metabolites. The S. sanguinis model was based on a previously published model for another oral bacterium, S. mutans (Jijakli & Jensen, 2019).
Our algorithm searched for a binary state representing each of the 584 enzymes in the model. Enzymes were mapped to reactions using a flux coupling approach described in Pradhan, et al. (2019). Enzymes set to “off” during evolution prevented the associated reactions from carrying flux, while reactions with “on” enzymes could carry any physiologically feasible flux. Populations of binary enzyme vectors were mutated and subjected to uniform crossover using a genetic algorithm. The fitness of each vector was quantified by comparing flux balance analysis simulations to growth measured in 7,534 media by an automated phenotyping system. Evolution continued until the population or its fitness values stabilized.
Results, Conclusions, and Discussions:: The S. sanguinis model vastly overpredicted growth before reconciliation with experimental data. For example, the model predicted that S. sanguinis can synthesize all amino acids de novo due to a complete set of biosynthetic pathways in its genome. However, our experimental data reveal that S. sanguinis has multiple amino acid auxotrophies.
Starting with random enzyme vectors, our evolutionary algorithm required fewer than 500 generations to improve the model’s predictions. All of the improvements changed “grow” predictions to “no-grow” by inactivating enzymes. Fixing inaccurate no-growth predictions by the model would require adding additional reactions to the model, which is a future direction for this work.
Our most significant observation is that the quality and depth of the experimental data has the largest effect on our algorithm’s performance. This finding has two implications. First, future algorithms should scale well with increasing experimental data, as algorithms that are limited to small datasets may underperform their big-data analogues. Second, algorithm developers should embrace automation and new technologies for data collection to maximize the performance of their algorithms. Computational biologists can help design experiments that meet the needs of their algorithms and are maximally informative for reconciling a model.