Bioinformatics, Computational and Systems Biology
Mutation and spatial transcriptomic profiling reveal the tumor heterogeneity of breast cancer.
Marissa Esteban
Research Student
University of San Diego and the Karolinska Institutet
Bothell, Washington, United States
Cedric (Cevi) Bainton
Research Intern
Karolinska Institutet, United States
Rasha Aljelaify
PhD Student
Karolinska Institutet, United States
Breast cancer (BC) is a heterogeneous disease that can be traced to various genetic factors which influence its development and histological type. Given the heterogeneous nature of BC, patients may respond differently to identical treatments, highlighting the need for more individualized treatment strategies. Studying inter- and intra-tumor heterogeneity would provide a better understanding of how the genetic differences present between BC tumors could dictate treatment response between patients, and provide accurate and precise classification of breast tumors to better guide therapy decisions.
Invasive ductal carcinoma (IDC) is the most common advanced BC type. Somatic mutations in IDC, especially driver mutations on cancer genes, are one of the main targets when studying the cancer genome.
In this study, we are combining spatial transcriptomic (ST) and whole genome sequencing (WGS) data of BC patients to investigate breast cancer heterogeneity. With the ST data we can see the spatial heterogeneity of the tumors within a tissue section of a single patient, and also between tissues of multiple patients. Using the WGS data, we will profile the single nucleotide polymorphisms (SNPs) of each patient, aiming to capture the heterogeneity of mutations between patients. Using the 3’ RNA sequence from the ST data, we will also investigate the distribution of 3’ SNPs on the tissue sections. The integration of ST and WGS data will allow us to begin exploring the mutation profiles of different BC types which may provide unique insights into the mechanisms that differentiate cancer types and contribute to more personalized treatment strategies.
WGS and ST data of tumor and non-tumor tissues were collected from five breast cancer patients from Saudi Arabia. For each of these patients, certified pathologists have annotated the tissue and identified their BC type. ST analysis was performed using Seurat, an R package, where the most differentially expressed genes were identified and the tissue section was clustered based on gene expression. Somatic and germline variant detection was performed using the comprehensive nf-core workflow called Sarek, which processed FASTQ files from both the tumor and normal control tissue for these individuals. These ran through various preprocessing (fastqc, mapping, markduplicates etc.), variant caller (strelka, manta, tiddit), and annotation tools (snpEFF and VEP) to produce a set of annotated germline and somatic variants in VCF files. From these files, we extracted the SNPs and their functional annotations, which included information about their location and any gene associations/implicts (if any). Downstream analysis will include focusing on the most highly mutated and most commonly mutated genes within and among each of the patients. We will identify which SNPs are located on the 3’ of the genes and then analyze the distribution of these SNPs among the capturing spots of the ST tissue.
Pairing ST data with WGS provides a more specialized mutation profile of different BC types when localized to the morphology of the tissue. With a more clear image of the heterogeneity within BC, this research offers insights into the molecular mechanisms underlying genetic variation between BC subtypes (i.e. invasive and non-invasive ductal carcinomas). Spatially locating the SNPs may serve as additional markers for traits corresponding to the BC types and their responses to certain treatments.
Overall, the combination of ST and WGS may enhance our understanding of BC heterogeneity, and may have implications for improving personalized treatment strategies for BC patients.