Bioinformatics, Computational and Systems Biology
Cevi Bainton
Bioinformatics Research Student
Harvey Mudd College
MILL VALLEY, California, United States
Marissa Esteban
Research Student
University of San Diego and the Karolinska Institutet
Bothell, Washington, United States
Rasha Aljelaify
Graduate Researcher in Biomedical Sciences
Karolinska Institute, United States
Carsten Daub, PhD, MS
Group Leader in Clinical Transcriptomics
Karolinska Institute, United States
Invasive ductal carcinoma (IDC) requires a more intense and prolonged treatment than ductal carcinoma in situ (DCIS) in breast cancer. However, DCIS can be challenging to distinguish from IDC, and DCIS can progress into IDC. This makes a DCIS diagnosis more challenging for patients and care providers. Until recently, this progression was analyzed with bulk genomic and RNA-seq methods applied to large sections of tumor. The rescind development of spatial transcriptomic (ST) sequencing now allows for high spatial resolution to analyze heterogeneous cancer samples. Previous papers have combined whole genome sequencing (WGS) and spatial transcriptomic RNA sequencing to identify subpopulations within a tumor sample by identifying unique inferred copy number variation and single nucleotide polymorphism regions. This project seeks to broaden these efforts of ST inference to a select group of Saudi Arabian breast cancer patients and compare key identifying mutations to the Saudi Arabian population reference genome.
This study includes a total of thirteen patients with breast cancer from Saudi Arabia. Eleven patients had spatial transcriptomics profiles made for two tumor and two control samples each. Spatial transcriptomics profiles was performed using the 10x Genomics Visium Fresh Frozen Tissue kit. Expert pathologist tissue annotation as DCIS, IDC, invasive lobular carcinoma, or normal tissue was done by hematoxylin and eosin staining. Whole genome sequencing was performed on seven patients of paired tumor non-tumor samples with an Illumina HiSeq method through the Broad Institute. Five of patients with spatial transcriptomic profiling also received whole genome sequencing. Analysis was done with the CellRanger (ST) and the combined NF-Core Sarek pipeline (WGS according to GATK best practices). Continuing analysis uses cb_sniffer and inferCNV to determine ST mutation regions. Principal component and clustering analysis was performed on the ST data to generate subpopulation clustered based on differential expression.
Differential expression analysis of ST data generated between two and seven clusters for each sample (both control and tumor). Each cluster had an average of 225.5 significantly differentially expressed genes. Current analysis of the WGS data of seven patients indicates that half of all genes that are highly significant in differential expression have identified DNA mutations at or near the gene. Future analysis hopes to investigate the spatial distribution from the ST data of these mutations identified by whole genome sequencing, and infer mutations in patients with just ST data.
Spatial transcriptomics has allowed for a high resolution analysis of DCIS and IDC tissues to investigate what genes are dysregulated in the progression of DCIS to IDC. Identification of single nucleotide polymorphism and copy number variation mutation sites in the dysregulated genes of these clusters could shed light on the mutations that lead to DCIS to develop into IDC.These mutation sites could also be investigated in the germline mutations of our Saudi Arabian patients and the Saudi Arabian reference genome. Specific mutations that are more common in Saudi Arabian patients could be used to better curate treatment decisions for Saudi patients with DCIS diagnosis.