Assistant Professor University of Nebraska-Lincoln, Nebraska, United States
Introduction:: Pancreatic ductal adenocarcinoma (PDAC) is a disease with poor prognosis, and its correct and early diagnosis is crucial for the survival of patients. For this purpose, it is important to recognize its biomarkers as early as possible. Due to a large variability in nature and symptoms, personalized patient treatment is gaining much traction in treating PDAC. Genome-Scale Metabolic modeling (GSM) of PDAC, combined with patient-specific transcriptomics data, can be a way of identifying personalized disease conditions. The transcriptomic data integrated GSMs can predict the differentially regulated metabolic pathways leading to metabolic vulnerabilities that are leading to the PDAC and can be leveraged to treat the disease. Furthermore, a combination of GSM and machine learning (ML) has been shown to be an effective tool to identify specific biomarkers. However, this is not well-explored for PDAC and is the focus of this work. We initially generated ML classifiers’ input datasets by sampling the feasible solution space of our previously reconstructed GSMs of PDAC and a healthy pancreas [1]. Afterwards we used ML classifiers for detecting cancerous and healthy samples. Finally, we identified the discriminating factors between normal and PDAC cells. A simple schematic of this work can be seen in the Figure 1.
Materials and Methods:: The cancerous and healthy pancreas models were reconstructed using constraint-based reconstruction and analysis (COBRA) toolbox in MATLAB [2], and by using the transcriptomics data of 16 patients and 2 healthy people given from The Cancer Genome Atlas (TCGA) database [1]. Utilizing Coordinate Hit-and-Run with Rounding (CHHR) algorithm [3] as a sampling method, we generated a balanced dataset containing 1000 samples for each state, and the samples were metabolic fluxes of reactions occurring in each system. Subsequently, we trained different ML classifiers using the generated dataset as their input. By applying the k-nearest neighbors and logistic regression, we understood the two states are linearly separable, and by implementing a random forest classifier and the feature selection option, we were able to identify the different metabolic pathways between two states as the biomarkers of PDAC.
Results, Conclusions, and Discussions::
Result: The random forest model can be used to perform feature selection. It enables us to determine the relative importance of each feature in discriminating the targets which here are being cancerous or healthy. For PDAC, over 100 pathways were identified, a selection of which can be seen in Table 1. The select group of pathways is shown here based on their degree of importance. The degree of importance of each pathway is a criterion showing how much each pathway is effective in discriminating the classification of data points. In other words, the more the importance of a pathway, the more different the pathway is in cancerous state in comparison to the healthy state. To further verify the importance of each identified pathway, literature evidence was found demonstrating a known role in some kind of cancer.
Conclusion: The work reported here addresses the issue of how one can effectively use gene expression data to specify PDAC biomarkers by exploiting both genome-scale metabolic modeling and machine learning methods. Although each method can be used separately for the same goal, the combination of both can give us more insights in better detection of PDAC. The framework we developed was able to extract the most discriminating pathways, such as thiamin metabolism, serotonin and melatonin biosynthesis, and pyrimidine metabolism among healthy and cancerous states. These specified biomarkers can be used as potential drug targets for treatments in future works.
Discussion: Going forward, we will generate patient specific GSM models to investigate the effect of age, gender, and race on PDAC using ML. For this purpose, ML techniques will be implemented to increase the number of samples from TCGA, and new techniques will be used to generate ‘fake’ samples from existing samples to make the dataset a balanced one. Afterwards, we will go through the previously stated ML methods to identify the metabolic shifts of PDAC.
Acknowledgements (Optional): :
References (Optional): : [1] Islam M. M., Goertzen A., Singh P. K., and Saha R., "Exploring the metabolic landscape of pancreatic ductal adenocarcinoma cells using genome-scale metabolic modeling," iScience, p. 104483, 2022.
[2] Becker S. A., Feist A. M., Mo M. L., Hannum G., Palsson B. Ø., and Herrgard M. J., "Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox," Nature protocols, vol. 2, no. 3, pp. 727-738, 2007.
[3] Haraldsdóttir H. S., Cousins B., Thiele I., Fleming R. M. T., and Vempala S., "CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models," Bioinformatics, vol. 33, no. 11, pp. 1741-1743, 2017.
[4] Lu’O’Ng K. V. Q. and Nguyễn L. T. H., "The role of thiamine in cancer: possible genetic and cellular signaling mechanisms," Cancer genomics & proteomics, vol. 10, no. 4, pp. 169-185, 2013.
[5] Danilovich M. E., Alberto M. R., and Juárez Tomás M. S., "Microbial production of beneficial indoleamines (serotonin and melatonin) with potential application to biotechnological products for human health," Journal of Applied Microbiology, vol. 131, no. 4, pp. 1668-1682, 2021.
[6] Piano V. et al., "Discovery of inhibitors for the ether lipid-generating enzyme AGPS as anti-cancer agents," ACS chemical biology, vol. 10, no. 11, pp. 2589-2597, 2015.
[7] Yang Z. et al., "LncRNA and Gene expression profiling of human bladder cancer," Cancer Plus, vol. 1, pp. 43-49, 2019.
[8] Sohoni S. et al., "Elevated Heme Synthesis and Uptake Underpin Intensified Oxidative Metabolism and Tumorigenic Functions in Non–Small Cell Lung Cancer CellsAltered Heme Homeostasis in Non–Small Cell Lung Cancer," Cancer research, vol. 79, no. 10, pp. 2511-2525, 2019.
[9] Madsen C. T. et al., "Biotin starvation causes mitochondrial protein hyperacetylation and partial rescue by the SIRT3-like deacetylase Hst4p," Nature communications, vol. 6, no. 1, pp. 1-12, 2015.
[10] Li J.-P., "Heparin, heparan sulfate and heparanase in cancer: remedy for metastasis?," Anti-Cancer Agents in Medicinal Chemistry (Formerly Current Medicinal Chemistry-Anti-Cancer Agents), vol. 8, no. 1, pp. 64-76, 2008.
[11] Bhardwaj A., Embury M. D., Ju Z., Wang J., and Bedrosian I., "Gene signature associated with resistance to fluvastatin chemoprevention for breast cancer," BMC cancer, vol. 22, no. 1, pp. 1-9, 2022.
[12] Tronstad K. J., Berge K., Bjerkvig R., Flatmark T., and Berge R. K., "Metabolic effects of 3-thia fatty acid in cancer cells," in Current Views of Fatty Acid Oxidation and Ketogenesis: Springer, 2002, pp. 201-204.
[13] Bansal A. and Simon M. C., "Glutathione metabolism in cancer progression and treatment resistance," Journal of Cell Biology, vol. 217, no. 7, pp. 2291-2298, 2018.