Cellular and Molecular Bioengineering
Yan Yan
Postdoctoral Scholar
Rice University
Houston, Texas, United States
Mingjie Dai, PhD
Assistant Professor
Rice Unicersity
Houston, Texas, United States
Recent advances in high-throughput DNA sequencing has broadly transformed biological research and biomedicine, leading to single-cell sequencing and precision medicine. However, the cost and throughput of sequencing readout still pose significant limits in the depth and scalability of single-cell sequencing studies. Current single-cell studies typically measures 5,000-10,000 cells, and allocates 50,000-100,000 reads per cell, which is ~100x short of what is required for full coverage of mRNA expression profile (6 logs of dynamic range); and at the same time ~100x short to reliably detect rare, drug resistance conferring cells (down to 1/200,000 level). Moreover, the human proteome exhibits an even wider dynamic range (7-9 logs), which makes sequencing-based deep proteomics even more challenging. Here we developed a new method that is capable of cost-effective sequencing of complex nucleic acid samples (RNA, DNA-barcoded protein, etc), over a wide dynamic range, and with high quantitative accuracy. Our method works by effectively compressing the dynamic range of gene expression in an un-targeted and unbiased fashion, and allowing more effective allocation of sequencing reads to low-expression genes. Our method can eventually enable deep and scalable single-cell transcriptomic and proteomic profiling.
A single-cell multiplexed compression sequencing (cPCR) test is performed on human peripheral blood mononuclear cell (hPBMC, Lonza) samples, on a panel of 60 gene targets. hPBMC samples were tested using standard 10X Genomics 3’ mRNA profiling workflow (MD Anderson Advanced Technology Genomics Core), and analyzed using cellranger pipeline (10X). cPCR was performed using custom designed primers against 121 gene targets. These targets were chosen from 10X sequencing results, to span a dynamic range of 3x104 (or greater, but limited by sequencing depth). We performed PCR test to select 60 of the designed primers that generated well-amplified sequences aligning to human genome reference database at expected loci. Multiplexed cPCR reaction was performed with gene-specific limiting primers, and a common primer overlapping with the Illumina Read 1 primer all targets. Sample preparation was performed by supplementing reference amplicons into the sample, converting single-stranded product into double-stranded amplicon, and following an indexed library preparation workflow adapted from the manufacturer’s recommendation (NEBNext Multiplex Oligos for Illumina). Reads are locally aligned with bowtie2 to a library that includes all target amplicon sequences, and full-length matches with a low sequencing error (edit distance < =2) are selected and counted, and normalized against the reference amplicons.
Bulk sequencing analysis (i.e., without cell barcode and UMI) showed that, within the set of 60 well-designed targets, cPCR showed close-to-uniform target read depth across the entire dynamic range of genes tested (3x104). When normalized to the same total number of sequencing reads (50,000), cPCR allocated ~5x fewer sequencing reads per gene for the most abundant gene group, 10~100x more reads for the median-to-low abundance genes, and up to ~5,000x more reads for the lowest abundance genes tested (Fig.1a), achieving a ~100x reduction in overall dynamic range (Fig. 1b). The lower peak in Fig. 1b and 1e are likely due to non-ideal primer design in our preliminary test.
After cell barcode and UMI demultiplexing, cPCR-seq (74,000 total reads) detected >99% (4,058) of cell barcodes identified from a 10X single-cell dataset (randomly sub-sampled 10,000,000 reads, 4,074 total barcodes), and 92% (3,757) for barcodes with at least 5 distinct UMIs (Fig. 1c). Within the mutually detected single cells and out of the 46 mid- and low-abundance genes, cPCR detected up to >20 genes in individual cells (mean 7.21, stdev 4.13), which is roughly 4x higher than 10X analysis (< =7, mean 1.83, stdev 1.45, Fig.1d). Furthermore, for each mid- or low-abundance gene, cPCR-seq detected 10-100x more single cells expressing this gene (Fig. 1e), significantly reducing dropout and improving quantitation.
Our results suggested that, cPCR-seq achieves >100x dynamic range compression, >100x higher effective sequencing depth, and up to 5,000x read enrichment for individual low-abundance genes, as compared to standard 10XG single-cell 3’ mRNA analysis, thus allowing for much deeper (more genes and molecules), more complete (less dropouts), and more scalable single-cell transcriptomics.