Machine Learning as Intensive Care Tools: Predicting Mortality in the Emergency Department ICU

Rapid and accurate assessment of patient condition is critical in the emergency department intensive care unit (ED-ICU). To enhance these capabilities, advanced analytical methods, particularly machine learning, are emerging as powerful Intensive Care Tools. This article delves into a retrospective study conducted to evaluate the effectiveness of different machine learning models in predicting 7-day mortality for patients admitted to the ED-ICU. By leveraging routinely collected clinical data, these models offer a potential avenue for improving risk stratification and informing timely interventions in this high-stakes environment.

Study Setting and Data Collection Methodology

This study, carried out at Peking University Third Hospital, a major medical facility, meticulously examined patient data from their emergency system. This system includes an 18-bed resuscitation unit and a dedicated 15-bed ED intensive care unit. The research focused on a retrospective cohort of patients who were admitted to either of these units between February and December 2015. Inclusion criteria mandated that patients were alive upon arrival by emergency medical services (EMS). Exclusion criteria, detailed in Fig. 1, were applied to refine the study population and ensure data integrity.

This image illustrates the patient selection process for the machine learning mortality prediction study, outlining inclusion and exclusion steps from emergency medical services arrival to final cohort analysis.

The study received approval from the Peking University Third Hospital Medical Science Research Ethics Committee. A waiver of informed consent was granted because the research was retrospective and utilized data collected as part of standard patient care, ensuring no alteration to treatment protocols. To maintain rigor and consistency, an expert panel of emergency physicians and epidemiologists established standardized data extraction procedures. Physicians trained in resuscitation extracted data from electronic medical records, encompassing demographics, comorbidities, physiological measurements, laboratory results, diagnoses, and length of hospital stay. Data collection was limited to variables recorded within the first 6 hours of medical contact to focus on initial patient presentation. The primary outcome measured was death within seven days of admission, identified from medical records by the research team.

Methodological Rigor: Data Imputation and Feature Selection

Addressing the challenge of missing data, a common issue in retrospective studies, the researchers employed a pragmatic approach. Given that the proportion of missing data for collected features was less than 5%, missing values were replaced with the average value for that specific variable, calculated from complete patient records. This imputation method minimized data loss while maintaining the overall data distribution.

A critical step in developing effective machine learning models is feature selection. The Peking University Third Hospital health records database contains a vast array of clinical variables, many of which are redundant or poorly structured for machine learning applications. To construct optimal models, a rigorous feature screening process was implemented. Candidate variables were carefully evaluated and discussed in team meetings, drawing upon established risk adjustment algorithms, published ICU admission criteria (specifically DAVROS and SAPS 3), and insights from the Delphi method and literature reviews. Each variable was assessed based on its clinical significance, representativeness of patient condition, and accessibility within the dataset. This meticulous screening process ultimately resulted in the selection of 75 features deemed most relevant for predicting mortality.

Model Development: Employing Machine Learning as Predictive Intensive Care Tools

The study harnessed the power of Python 2.7 (Anaconda) with the scikit-learn 0.19.1 framework to develop and compare four distinct machine learning models. Python 2.7’s flexibility and robust libraries make it well-suited for machine learning experimentation and debugging. The four models chosen represent a range of algorithms commonly used in predictive analytics:

Logistic Regression (LR): This widely used statistical model excels in binary classification tasks. In this study, univariate analysis was initially performed on all 75 features to identify those significantly associated with 7-day mortality (p < 0.05). Multivariate logistic regression was then applied to this subset of variables to further refine the model and explore the independent contribution of each feature in predicting death. LR models the probability of an event (in this case, death) based on a set of predictor variables, providing insights into risk factors and their relative importance.
Support Vector Machine (SVM): Based on statistical learning theory, SVM seeks to define an optimal hyperplane that effectively separates different classes of data. SVM is particularly effective in high-dimensional spaces and aims to maximize the margin between classes, enhancing classification accuracy. In the context of mortality prediction, SVM aims to distinguish between patients who will survive and those who will not, based on their clinical features.
Gradient Boosting Decision Tree (GBDT): GBDT is a powerful ensemble learning method that combines boosting techniques with decision tree algorithms. Decision trees offer rapid classification and model visualization, while boosting iteratively improves model performance by focusing on misclassified instances. GBDT builds multiple decision trees sequentially, with each tree correcting the errors of its predecessors, leading to a robust and accurate predictive model.
XGBoost (Extreme Gradient Boosting): XGBoost is an advanced and highly efficient implementation of gradient boosting. It builds upon GBDT by incorporating regularization terms to prevent overfitting and using second-order Taylor expansion for optimization, leading to faster convergence and improved accuracy. XGBoost also supports column sampling, similar to random forests, further enhancing robustness and reducing computational cost. Its performance and efficiency have made XGBoost a leading algorithm in machine learning competitions and real-world applications.

The implementation of LR, SVM, and GBDT models utilized the scikit-learn 0.19.1 package directly. XGBoost was implemented using the XGBoost 0.82 framework integrated with scikit-learn 0.19.1. Model parameters were set as follows: L1 regularization for LR, linear kernel for SVM, and default parameters for GBDT and XGBoost to ensure a standardized comparison.

Model Comparison and Performance Evaluation

To rigorously assess the performance of each model as intensive care tools for mortality prediction, the study employed ten-fold cross-validation. This method randomly divides the patient dataset into ten equal folds. In each of the ten iterations, one fold is used as the test set, while the remaining nine folds are used for training the model. This process is repeated ten times, with each fold serving as the test set once. The results from each iteration are then averaged to provide a robust estimate of the model’s performance.

The performance metrics used to compare the models included:

Accuracy (ACC): The overall correctness of the model’s predictions.
Sensitivity (Se): The model’s ability to correctly identify patients who died (true positive rate).
Specificity (Sp): The model’s ability to correctly identify patients who survived (true negative rate).
Youden Index: A summary measure of diagnostic effectiveness, calculated as Sensitivity + Specificity – 1.
Area Under the Curve (AUC): The area under the receiver operating characteristic (ROC) curve, representing the model’s ability to discriminate between patients who died and those who survived across different threshold settings.

By calculating these metrics across the ten folds of cross-validation, the study provided a comprehensive and reliable comparison of the predictive capabilities of LR, SVM, GBDT, and XGBoost as intensive care tools in the ED-ICU setting. The averaged results from this rigorous evaluation are crucial for determining the most effective machine learning approach for mortality prediction in this critical care environment.

Ethical Considerations

The study rigorously adhered to ethical guidelines. The Institutional Ethics Committee provided formal approval, ensuring that all research activities were conducted in accordance with relevant regulations. The retrospective nature of the study, utilizing anonymized patient data collected during routine clinical care, justified a waiver of informed consent, as no additional interventions or risks were imposed on patients. This ethical oversight underscores the responsible use of patient data for improving healthcare through machine learning applications.

This research highlights the potential of machine learning algorithms, particularly XGBoost, as valuable intensive care tools for predicting mortality in the ED-ICU. The findings contribute to the growing body of evidence supporting the use of artificial intelligence to enhance clinical decision-making and improve patient outcomes in critical care settings. Further research should focus on prospective validation and clinical implementation of these predictive models to realize their full potential in improving intensive care delivery.