Associate Professor University of Connecticut Health Center, United States
Introduction:: As the world enters digital, archiving millions of terabytes of data being produced every day in DNA provides an alternative superior to traditional electromagnetic and optical data storage methods owing to its higher density, longer preservation time and reducing cost of large-scale oligo pool synthesis. Random access to data stored in DNA through polymerase chain reaction (PCR)1 has enabled reliable retrieval of a specific DNA file from a complex oligo pool. However, random access was typically based on unique file identifications (IDs) instead of the actual content, preventing accessing files of interest without prior knowledge of file names. In this work, we propose Search Enabled by Enzymatic Keyword Recognition (SEEKER), a search tool to determine the existence of keywords in selectively PCR-amplified text files encoded in DNA utilizing the trans-cleavage activity of CRISPR-Cas12a. This system returns easily intelligible results by generating visible fluorescence when a keyword is identified. SEEKER can also be implemented on a 3D-printed microfluidic disc to allow miniaturization and more convenient operation.
Materials and Methods:: SEEKER involves a PCR file amplification step for file retrieval and a CRISPR-based detection step to determine the existence of keywords. Reliable retrieval of a specific file was achieved by algorithmic PCR primer design, which involved 1) sequence screening to keep a balanced GC content and avoid homopolymers which may hinder sequence recovery from sequencing; 2) thermodynamics screening to prevent primers from forming homodimers, heterodimers and hairpin structures while maintaining an ideal melting temperature and 3) orthogonality screening to minimize cross-talks between primer pairs. Experimentally, PCR was carried out with 35 cycles using Kapa HiFi enzyme mix and 4 µM of each primer. 1.5 µL PCR amplicons were then transferred into a mixture of 1× NEBuffer™ 3.1, 5 µM of ssDNA-FQ, 300 nM of Alt-R® A.s. Cas12a (Cpf1) V3 (IDT) and 12 nM of dual crRNAs queries. The reactions were incubated at 37 °C for 2 h with fluorescence monitoring every 15 s. For on-chip SEEKER, 350 µL reaction mixture excluding crRNAs was lyophilized on a 3D-printed microfluidic chip and preserved at 4 °C. When using, 100 nM of crRNA queries were introduced from the central chamber to rehydrate the chip, and 1.5 µL PCR amplicons were loaded into side chambers. The chips were then sealed with PCR tape and incubated at 37 °C for 20 min. Fluorescence readouts were obtained through ChemiDoc MP Imaging System (Bio-Rad Laboratories).
Results, Conclusions, and Discussions:: In-tube SEEKER was able to identify four keywords in 40 selectively amplified files stored in synthetic DNA with an accuracy of 98.125% (Fig. 1a). The accuracy was not compromised if SEEKER was implemented on-chip (Fig. 1b). Moreover, we found the fluorescence intensity was proportional to the times of keywords appearing in the text with R2 = 0.88 for in-tube SEEKER and R2 = 0.71 for on-chip SEEKER (Fig. 1c). These results suggest that SEEKER is able to conduct keyword search in an accurate and quantitative way.
This work demonstrates that the idea of harnessing the trans-cleavage activity of CRISPR-Cas12a in molecular diagnostics can be broadened to applications in DNA data storage to achieve quantitative searching of information stored in complex oligo pools. Combined with rational design of PCR primers, payload oligo sequences and error-correcting nucleotides, content of files containing the keyword that users are interested in can be fully accessed through next-generation sequencing at a sequencing depth of 20×–40×. By accessing data of interest in DNA memory by content instead of IDs, SEEKER may bring commercial DNA data storage closer to reality.
Acknowledgements (Optional): :
References (Optional): : Reference: [1] Organick, L. et al. Nature Biotechnology (2018). 36. 242–248.