Cellular and Molecular Bioengineering
Ronan O'Connell
Graduate Student
Rice University
Houston, Texas, United States
Caleb Bashor
Assistant Professor in Departments of Bioengineering & BioSciences
Rice University, United States
Synthetic genetic circuits hold promise as a means for precisely controlling cellular behavior for a diverse array of medical and biotechnological applications. However, regulatory circuit engineering efforts are plagued by context-dependent effects that can influence circuit performance in ways that are currently difficult to predict or design for. There are several factors that –to varying degrees– determine the performance of regulatory programs: 1) interactions between the individual genetic components that comprise the circuit, 2) transcriptional crosstalk between adjacent expression units, 3) proximate upstream and downstream chromatin landscape, and 4) chromatin folding and spatial compartmentalization. The challenges imposed by context dependence are particularly acute in mammalian cells, where gene expression profiles are controlled by a complex combination of overlapping regulatory forces from these multiple scales of genomic organization, making it exceptionally difficulty to parse the contribution of each. To explore how combinations of genetic features yield emergent regulatory function, we preciously developed CLASSIC: an ultra-high throughput cell engineering platform that enables construction and quantitative assessment of multi-dimensional design spaces in mammalian cells en masse and deeper biological understanding using machine learning (ML)(1). Building upon the original tamoxifen-inducible synthetic circuit design space we assayed using CLASSIC, we introduce the concept of genomic context as an important design consideration. Experimental evaluation of this genetic design spaces provides the opportunity to identify robust design principles that describe how genetic composition and genomic context collaborate to determine circuit performance: an important step towards the goal of context-aware, AI-driven synthetic control of mammalian cell function.
CLASSIC uses combinatorial plasmid assembly to generate large libraries of individually barcoded full length genetic circuits from a smaller set of defined genetic components. Following long read nanopore sequencing, circuit compositions are mapped to a short barcode sequence. Genomic integration, phenotypic selection, and short read sequencing of the barcode region then yields a phenotype-to-barcode map. Together, these two data sets provide a composition-to-function mapping for the assayed design space. ML models can then be trained on these large datasets and used to predict behavior for members of the design space that could not be measured, refine existing measurements, and identify complex design principles. Adding features related to the genomic location, such as chromatin compartment, DNase hypersensitivity, and chromatin regulator binding, to this experiment significantly increases the dimensionality of the design space. To comprehensively assay this vast design space, we constructed a library of ~8,000 tamoxifen-inducible genetic circuits in which GFP expression serves as the output, as described previously (1). We integrated the library randomly into the genome of HEK293T cells using PiggyBac. We then FACS-sorted the integrated cell library into 8 bins based on expression of the GFP reporter both in the presence and absence of tamoxifen. Illumina sequencing of the barcode region of cells from each bin was used to assign a GFP expression value to each barcode. The associated circuit composition and genomic context for each barcode were identified using nanopore sequencing of the plasmid library and genomic fragmentation and sequencing of the integrated cell library.
In route to generating a platform that enables data-driven identification of how the different features of genetic context impact synthetic circuit behavior, we devised two experimental systems to separately assess the regulatory contributions of 1) genetic composition and 2) genomic position. Understanding how these factors influence gene expression profiles in relative isolation is important for later assessing their combined influence. To this end, we first sought to assess the degree of transcriptional crosstalk between adjacent expression cassettes in a synthetic locus composed of genetic parts commonly used in mammalian synthetic biology. We constructed a large library (27,000 members) of tandem expression units (x3) controlled by different combinations of promoter (x5) and terminator (x6) pairs and integrated the library at a common genomic location in HEK293T cells. We observe both part-specific and part combination-specific regulatory interactions within the synthetic locus. Interestingly, certain genetic elements and positions in the expression array appear more susceptible to activation by neighboring genetic elements. To assess whether the genomic location of a synthetic gene circuit significantly impacts expression characteristics, we randomly integrated a single tamoxifen-inducible circuit randomly across the genome in HEK293T cells using PiggyBac transposase. Clonal isolation and expansion of cells from this library (n=45) leads to a broad distribution of average GFP expression, both in the presence and absence of tamoxifen, and almost 20-fold difference in the spread of fold change values. Taken together, these isolated experiments suggest that genomic position and locus architecture are powerful modulators of gene expression, and that understanding their combined effect requires simultaneous diversification.