Technologies for Emerging Infectious Diseases
A Stacking Machine Learning Model-based Early Mortality Prediction Technique for Sepsis Patients
Khandaker Reajul Islam
Graduate Researcher
UKM, United States
Johayra Prithula
Researcher
QU, United States
Jaya Kumar
Associate Professor
UKM, United States
Muhammad E. H. Chowdhury
Assistant Professor
Qatar University
Doha, Ad Dawhah, Qatar
Mamun Bin Ibne Reaz
Professor
UKM, United States
Toh Leong Tan
Professor
UKM, United States
Anwarul Hasan
Associate Professor
Qatar University
Doha, Ad Dawhah, Qatar
Critical care medicine struggles with sepsis, a life-threatening infection-induced dysregulated systemic inflammatory response. Machine learning has improved medical outcome prediction in recent years. These methods for predicting intensive care unit (ICU) septic patient death are understudied. We use machine learning techniques and a comprehensive set of clinical and demographic variables to improve mortality prediction in this high-risk population.
This study aims to predict early ICU mortality and assess the predictive accuracy of machine learning algorithms for ICU mortality among septic patients. The study utilized a retrospective cohort derived from ICU records. Initial sample size was 63.77K, for the study. However, 6511 ICU septic patients for whom the mortality information was available were finally used. Out of 150 different parameters, 50 variables including demographic, vitals, and laboratory biomarkers were included. Among the septic patients, 91.2% were alive and 8.8% were dead. This disparity caused the model's predictions to be skewed toward the majority of instances. A combination of under-sampling and over sampling was used to get around this problem.
Data preprocessing, missing data imputation, feature ranking and model training, validation and testing were done using Python software 3.9, and the Scikit-learn package. Three different feature ranking algorithms (XGBoost, Random Forest, and Extra Trees) are used for ranking the feature importance. Nine different classical machine learning models (Support vector machine, Random forest, Multi-layer perceptron, XGBoost, AdaBoost, Logistic regression, Extra Tree, Gradient Boost, K-Nearest Neighbors) were trained, validated, and tested on the top ranked features using five-fold cross-validation. The best three performing models were identified, and a meta classifier was trained using the probabilities of the first-stage classifiers to provide final predictions.
Three algorithms (Multilayer Perceptron, Random Forest, and Logistic Regression) are the top performers in the first stage. The Multilayer Perceptron and Random Forest models achieved their highest level of precision for 13 and 23 features, respectively. The stacking machine learning model with Logistic Regression algorithm showed overall the best performance. It can predict ICU mortality with a sensitivity of 80.81 %, specificity of 97.04%, precision of 91.84%, accuracy of 91.78 %, F1-score of 91.61%, and an area under the curve (AUC) of 88% (Table 1).
The proposed stacking machine learning technique can predict the risk of death among the septic patients with only 19-clinical (Figure 1) parameters, which will significantly help in clinical management of these patients.
< !Pollard, Tom J., et al. "The eICU Collaborative Research Database, a freely available multi-center database for critical care research." Scientific data 5.1 (2018): 1-13.