Machine learning approach to predict venous thromboembolism among patients undergoing multi-level spinal posterior instrumented fusion

Kevin Y. Heo; Prashant V. Rajan; Sameer Khawaja; Lauren A. Barber; Sangwook Tim Yoon

doi:10.21037/jss-24-8

Original Article

Machine learning approach to predict venous thromboembolism among patients undergoing multi-level spinal posterior instrumented fusion

Kevin Y. Heo, Prashant V. Rajan, Sameer Khawaja, Lauren A. Barber, Sangwook Tim Yoon

Department of Orthopaedic Surgery, Emory University School of Medicine, Atlanta, GA, USA

Contributions: (I) Conception and design: KY Heo, LA Barber, ST Yoon; (II) Administrative support: KY Heo, PV Rajan, ST Yoon; (III) Provision of study materials or patients: KY Heo, S Khawaja; (IV) Collection and assembly of data: KY Heo, S Khawaja; (V) Data analysis and interpretation: KY Heo, LA Barber, ST Yoon; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Kevin Y. Heo, BS. Department of Orthopaedic Surgery, Emory University School of Medicine, 21 Ortho Ln, Atlanta, GA 30329, USA. Email: kevin.heo@emory.edu.

Background: The absence of consensus for prophylaxis of venous thromboembolism (VTE) in spine surgery underscores the importance of identifying patients at risk. This study incorporated machine learning (ML) models to assess key risk factors of VTE in patients who underwent posterior spinal instrumented fusion.

Methods: Data was collected from the IBM MarketScan Database [2009–2021] for patients ≥18 years old who underwent spinal posterior instrumentation (3–6 levels), excluding traumas, malignancies, and infections. VTE incidence (deep vein thrombosis and pulmonary embolism) was recorded 90-day post-surgery. Risk factors for VTE were investigated and compared through several ML models including logistic regression, linear support vector machine (LSVM), random forest, XGBoost, and neural networks.

Results: Among the 141,697 patients who underwent spinal fusion with posterior instrumentation (3–6 levels), the overall 90-day VTE rate was 3.81%. The LSVM model demonstrated the best prediction with an area under the curve (AUC) of 0.68. The most important features for prediction of VTE included remote history of VTE, diagnosis of chronic hypercoagulability, metastatic cancer, hemiplegia, and chronic renal disease. Patients who did not have these five key risk factors had a 90-day VTE rate of 2.95%. Patients who had an increasing number of key risk factors had subsequently higher risks of postoperative VTE.

Conclusions: The analysis of the data with different ML models identified 5 key variables that are most closely associated with VTE. Using these variables, we have developed a simple risk model with additive odds ratio ranging from 2.80 (1 risk factor) to 46.92 (4 risk factors) over 90 days after posterior spinal fusion surgery. These findings can help surgeons risk-stratify their patients for VTE risk, and potentially guide subsequent chemoprophylaxis.

Keywords: Venous thromboembolism (VTE); risk calculation; risk stratification; multi-level spinal posterior instrumented fusion; machine learning (ML)

Submitted Jan 20, 2024. Accepted for publication Apr 14, 2024. Published online Jun 17, 2024.

doi: 10.21037/jss-24-8

Highlight box

Key findings

• A total of 141,697 patients who underwent multi-level instrumented spinal fusion were analyzed with machine learning models to create a 90-day venous thromboembolism (VTE) risk model.

• Five key variables most closely associated with VTE after surgery included: remote history of VTE, diagnosis of chronic hypercoagulability, metastatic cancer, hemiplegia, and chronic renal disease.

What is known and what is new?

• The absence of consensus for prophylaxis of VTE in spine surgery underscores the importance of identifying patients at risk.

• While several risk factors for VTE have been identified, there is a lack in predictive models to help guide clinicians with patient risk stratification.

What is the implication, and what should change now?

• Surgeons can utilize these findings to identify patients who may benefit from more aggressive mechanical and chemoprophylactic agents.

Introduction

Venous thromboembolism (VTE) is a significant and potentially life-threatening complication that can occur after spine surgery. VTE encompasses deep vein thrombosis (DVT) and pulmonary embolism (PE), both of which can lead to severe morbidity and mortality if not promptly detected and treated (1). Despite several committee attempts to develop standardized guidelines for VTE mechanical and chemoprophylaxis (2-4), our understanding of the risk and subsequent prophylactic mitigation of VTE after spine surgery has been impeded by heterogeneous spinal procedures, inconsistent VTE prophylaxis policies and follow-ups across healthcare institutions, and differential anesthetic techniques (1,5,6).

Several recent studies have explored contributory risk factors for developing VTE following spine surgery, finding increased operative time, older age, body mass index, smoking, fusion and instrumentation procedures, cancer history, heart failure history, and chronic kidney disease to predispose patients to VTE following spine surgeries (5,7-9). One study utilized multivariable regression to develop a 13-point risk scoring system with a receiver-operating-characteristic curve area of 0.756 (10).

Relatively few studies have utilized machine learning (ML) methodologies to more comprehensively predict VTE risk after spinal surgery. ML provides the advantage of utilizing non-linear methods and complex decision trees that can adapt with repeated iterations in order to identify patterns for predicting outcomes. As a result, ML can identify trends that may have gone unnoticed in traditional linear analyses, especially when working with large, complex datasets. One study used a small cohort of 63 patients to identify 113 attributes, generating predictive models that were 81–89% accurate (11). A larger institutional study utilized deep neural networks (DNN) on over 108 variables to predict VTE and found that a history of cardiac disease and presence of VTE within 12 months of surgery were the highest contributors (12). One national study incorporated a larger sample of 13,500 patients, identifying age >65 years, obesity, coronary artery disease (CAD), functional status, and prolonged operative time to be significant multivariable predictors with an area under curve of 0.716 (13). We present the largest study to date on this topic, incorporating ML models to stratify risk factors for VTE in a cohort of >140,000 patients after spinal fusion with posterior instrumentation. We also present a simple tiered predictive model to help guide clinicians with patient risk stratification. We present this article in accordance with the STROBE reporting checklist (available at https://jss.amegroups.com/article/view/10.21037/jss-24-8/rc).

Methods

Data source

As this was a retrospective cohort review of a national de-identified database, institutional review board (IRB) approval was not necessary. Patients were identified from the IBM MarketScan^® Commercial Claims and Encounters and Medicare Supplemental and Coordination of Benefit databases (Ann Arbor, Michigan). The database is a collection of medical insurance claims databases from over 300 employer-sponsored and Medicare supplemental plans, containing more than 240 million de-identified patient records. The database provides information on inpatient admissions, outpatient visits, and pharmaceutical encounters. The database was selected as it is one of the largest administrative claims databases and allows for longitudinal follow-up of continuously enrolled patients. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Patient selection

The database was first queried for patients aged ≥18 years who underwent posterior spinal instrumentation (3–6 vertebral levels) between January 1, 2009, and December 31, 2021, as defined by the current procedural terminology (CPT) code ‘22842’ (Posterior segmental instrumentation (e.g., pedicle fixation, dual rods with multiple hooks and sublaminar wires; 3 to 6 vertebral segments). Any procedures with associated traumas, malignancies, or infections were excluded via associated diagnosis codes. Additionally, to ensure proper follow-up of the patient population, patients who were not continuously enrolled in the database for at least 6 months before surgery and 3 months after surgery were excluded. Finally, to limit confounder effects, patients with any episodes of VTE within 6 months prior to surgery were excluded.

Study variables and outcomes

Patient demographic information was collected from the database including age and sex. Ages were grouped into 5 categories as defined by the database: 18–34, 35–44, 45–54, 55–64, and 65+ years old. Comorbidity status was obtained using the Charlson Comorbidity Index (CCI). The CCI is a comorbidity measurement tool that is widely utilized to measure patients’ burden of diseases, which includes cardiovascular, neurologic, pulmonary, renal, and other chronic diseases (14). Additional comorbidities collected included obesity, smoking history, CAD, hypertension (HTN), hyperlipidemia, alcohol use disorder, depression, anxiety, atrial fibrillation, iron deficiency anemia, osteoporosis, valvular heart disease, a remote history of DVT or PE (>6 months prior to surgery), and a chronic hypercoagulable state. A chronic hypercoagulable state was defined as patients with a diagnosis of protein C or S deficiency, Factor V Leiden, antiphospholipid antibody, lupus anticoagulant, or other thrombophilia.

The primary outcome for this study was the diagnosis of VTE (including DVT and PE) within 90 days after surgery. Longitudinal tracking within the database allowed us to identify patients who had a 90-day VTE; as a result, patients were grouped as either having a VTE after surgery or no VTE after surgery. Comorbidities and complications were queried utilizing the ninth and tenth edition International Classification of Diseases (ICD) diagnostic codes (Table S1 and Table S2, respectively) (15).

Statistical analyses and predictive model construction

Descriptive statistics were generated based on demographics and CCI score between the two cohorts. Chi-squared tests were used to determine differences in categorical variables, and Student’s t-tests were used to analyze differences in continuous variables. To evaluate differences in each comorbidity collected, multivariable logistic regressions were performed, controlling for sex and age. Patients that had no VTE served as the reference group. All statistical analyses were conducted using R Studio (PBC, Boston, MA, USA). Statistical significance was defined as P<0.05 for all tests.

Five ML models were utilized to predict patient risk factors for VTE within 90 days after surgery: XGBoost Tree, logistic regression, random forest, linear support vector machine (LSVM), and neural networks. XGBoost is an advanced implementation of a gradient boosting algorithm with a tree model as the base model (16). Multiple decision trees are trained to make predictions and identify feature importance. Logistic regression is a well-known method for building clinical prediction models utilizing general linear models (17). Random forest is a popular ML algorithm also utilizing decision tree models to construct classification tasks. LSVM is a robust classification technique that maps data to a high-dimensional feature space and incorporates a linear separator to classify data into separate categories. It is particularly suited for use with wide datasets (18). Neural networks are also a popular ML model that relies on interconnected nodes and hidden layers to accurately classify data.

Prior to ML analysis, the data was randomly down-sampled to half of the patients in the no VTE group. This was done in order to balance the data, as prediction models with heavily weighted sample sizes in one cohort can create skewed results (19). For instance, since 96.19% of patients in our study population had no VTE within 90 days after surgery, a ML model that predicts “no VTE” every time will still have 96.19% of correct predictions; thus, the predictive values may not be represented accurately. Furthermore, in order to minimize overfitting of our models, due to the large number of potential variables, we performed feature selection utilizing Pearson’s correlation to remove variables that were not categorized as highly predictive of VTE within the dataset.

The data was then randomly partitioned in an 80:20 ratio of training and testing groups, where the testing data was evaluated after completion of the ML training process. Five-fold cross validation was used for the purposes of hyperparameter optimization. For each ML algorithm, four-fifths of the encounters within the 80% training split were randomly selected to train the corresponding model, and the remaining one-fifth was used as a validation set to determine model performance. This process gets repeated in total 5 times utilizing a new training and validation set. The combination of hyperparameters that performed the best across all 5 iteration was selected for incorporation into the final testing model, in which the entire 80% training split was trained on by the corresponding ML model before being tested on the 20% testing set. In order to assess each ML prediction model, we computed the area under the receiving operative curve (AUROC) and derived sensitivity, specificity, positive predictive value, negative predictive value, diagnostic odds ratio (OR), positive likelihood ratio, and negative likelihood ratio from each confusion matrix. ML models were performed using SPSS Modeler version 18.4 (IBM, Chicago, IL, USA).

Finally, the model with the highest AUROC and diagnostic OR was then utilized to quantify the risks for developing 90-day VTE based on the top five feature selection derived from the respective model. VTE rates were incrementally calculated for patients with none of the top five risk factors to patients with all of the top five risk factors. Multivariable logistic regressions, controlling for age, sex, and all collected comorbidities were then performed to identify the ORs for developing a 90-day VTE in patients with increasing numbers of top five risk factors. Patients that did not have any of the top five risk factors served as the reference group.

Results

Population demographics

A total of 141,697 adult patients who underwent posterior spinal fusion with segmental instrumentation (3–6 levels) were identified in the database from 2009 to 2021. Of the 141,697 patients, 5,400 patients (3.81%) were found to have a VTE within 90-day after surgery (Table 1). Patients that had a 90-day VTE were older (61.75 vs. 58.31 years; P<0.001), less likely to be female (49.74% vs. 53.43%; P<0.001), and had a higher CCI score (2.77 vs. 1.83; P<0.001) compared to patients that did not have a 90-day VTE.

Table 1

Baseline demographic data by 90-day VTE cohort

Characteristics	No VTE	VTE	P value
Total patients, n (%)	136,297 (96.19)	5,400 (3.81)	–
Age (years), mean (SD)	58.31 (11.87)	61.75 (11.70)	<0.001
Age groups, n (%)			<0.001
18–34 years	4,717 (3.46)	109 (2.02)
35–44 years	11,865 (8.71)	295 (5.46)
45–54 years	29,645 (21.75)	887 (16.43)
55–64 years	53,955 (39.59)	2,060 (38.15)
65+ years	36,115 (26.50)	2,049 (37.94)
Female patients, n (%)	72,827 (53.43)	2,686 (49.74)	<0.001
CCI score, mean (SD)	1.83 (2.25)	2.77 (2.87)	<0.001

VTE, venous thromboembolism; SD, standard deviation; CCI, Charlson Comorbidity Index.

Multivariable analyses of comorbidities

Comparisons of patient comorbidities between the VTE and no VTE groups are shown in Table 2. Utilizing multivariable logistic regressions controlling for age and sex, patients that experienced a 90-day VTE were more likely to have a remote history of DVT or PE (OR 11.53; P<0.001), chronic hypercoagulability (OR 4.80; P<0.001), metastatic cancer (OR 2.81; P<0.001), hemiplegia (OR 2.33; P<0.001), atrial fibrillation (OR 1.73; P<0.001), congestive heart failure (OR 1.61; P<0.001), dementia (OR 1.58; P<0.001), chronic renal disease (OR 1.53; P<0.001), and many other additional comorbidities. Interestingly, patients with a smoking history (OR 0.83; P<0.001) had lower odds of developing a VTE after surgery. Based on individual multivariable logistic regressions, patients with a remote history of DVT/PE, those in a hypercoagulable state, patients with metastatic cancer, hemiplegia, and atrial fibrillation had the highest ORs for a 90-day VTE, respectively.

Table 2

Ninety-day venous thromboembolism rates with multivariable odds ratios by patient comorbidities

Comorbidity	No VTE (%)	VTE (%)	Odds ratio	95% CI	P value
Remote history of DVT/PE	1.20	12.67	11.53	10.49–12.67	<0.001
Chronic hypercoagulable state	1.54	6.78	4.80	4.28–5.39	<0.001
Metastatic cancer	1.25	3.85	2.81	2.43–3.26	<0.001
Hemiplegia	2.73	6.54	2.33	2.08–2.61	<0.001
Atrial fibrillation	5.15	10.54	1.73	1.58–1.90	<0.001
Congestive heart failure	6.77	12.35	1.61	1.48–1.76	<0.001
Dementia	0.54	1.13	1.58	1.21–2.06	<0.001
Chronic renal disease	7.31	12.69	1.53	1.41–1.68	<0.001
Valvular heart disease	12.98	20.35	1.48	1.38–1.58	<0.001
Cerebrovascular disease	13.52	21.22	1.44	1.34–1.54	<0.001
Myocardial infarction	5.01	8.29	1.42	1.28–1.57	<0.001
Iron deficiency anemia	6.18	8.93	1.42	1.29–1.56	<0.001
Peripheral vascular disease	13.52	21.09	1.41	1.32–1.51	<0.001
Solid malignancy	9.04	14.28	1.41	1.31–1.53	<0.001
Obesity	27.77	33.83	1.39	1.32–1.48	<0.001
Chronic lung disease	27.53	35.20	1.38	1.29–1.46	<0.001
Osteoporosis	8.78	12.55	1.34	1.23–1.46	<0.001
Coronary artery disease	20.52	28.91	1.29	1.21–1.38	<0.001
Diabetes w/ complication	8.05	11.31	1.25	1.15–1.36	<0.001
Mild liver disease	7.12	8.72	1.23	1.11–1.35	<0.001
Peptic ulcer disease	2.31	2.93	1.18	1.00–1.39	0.04
Diabetes w/o complication	22.69	28.17	1.18	1.11–1.26	<0.001
Hypertension	67.52	74.94	1.18	1.01–1.26	<0.001
Rheumatic disease	20.59	23.44	1.16	1.09–1.24	<0.001
HIV/AIDS	0.18	0.18	1.08	0.57–2.04	0.81
Depression	27.83	27.65	1.08	1.01–1.15	0.02
Hyperlipidemia	59.85	64.09	1.01	0.97–1.07	0.63
Anxiety	13.56	12.65	1.01	0.93–1.09	0.89
Alcohol use disorder	3.43	3.33	1.00	0.85–1.15	0.90
Moderate/severe liver	0.29	0.30	0.94	0.56–1.54	0.79
Smoking history	17.97	13.87	0.83	0.77–0.90	<0.001

Multivariable logistic regression controlled for by age, sex (reference is No VTE group). VTE, venous thromboembolism; CI, confidence interval; DVT, deep vein thrombosis; PE, pulmonary embolism; HIV/AIDS, human immunodeficiency virus/acquired immunodeficiency syndrome.

Predictive model parameters and assessment

Prior to performing the ML models, feature selection utilizing Pearson’s correlation was performed which removed human immunodeficiency virus or acquired immunodeficiency syndrome (HIV/AIDS), depression, anxiety, alcohol use disorder, moderate/severe liver disease, and peptic ulcer disease as inputs for the ML models. Variable rankings from the five ML models can be seen in Table 3. In all models, a remote history of DVT or PE was selected as the most important variable for predicting a 90-day VTE after posterior fusion with spinal instrumentation. Furthermore, chronic hypercoagulability, and hemiplegia were consistently featured by the different models (≥3) as top predictors of 90-day VTE. The model with the highest AUROC was the LSVM, which predicted that patients with a remote history of DVT or PE, chronic hypercoagulability, metastatic cancer, hemiplegia, and chronic renal disease were the most important variables for predicting a VTE within 90 days after surgery.

Table 3

Top five important variables for risk of 90-day venous thromboembolism by model

Model	AUROC	Variable 1	Variable 2	Variable 3	Variable 4	Variable 5
XGBoost tree	0.53	Remote history of DVT/PE	Chronic hypercoagulability	Male	Osteoporosis	Myocardial infarction
Logistic regression	0.66	Remote history of DVT/PE	Chronic hypercoagulability	Hemiplegia	Metastatic cancer	Atrial fibrillation
Random forest	0.58	Remote history of DVT/PE	Male	Rheumatic disease	Chronic pulmonary disease	Hyperlipidemia
Linear support vector machine	0.68	Remote history of DVT/PE	Chronic hypercoagulability	Metastatic cancer	Hemiplegia	Chronic renal disease
Neural networks	0.65	Remote history of DVT/PE	Chronic hypercoagulability	Hemiplegia	Atrial fibrillation	Chronic renal disease

AUROC, area under receiver operating characteristic curve; DVT, deep vein thrombosis; PE, pulmonary embolism.

Assessment of the testing sets from each ML model is shown in Table 4. Three of the five models had an AUROC of ≥0.65. All of the models demonstrated strong specificity for predicting patients with a 90-day VTE but demonstrated weak sensitivity. The LSVM was selected as the model with the highest accuracy due to the having the largest AUROC (0.68) and diagnostic OR (31.40).

Table 4

Confusion matrices by machine learning model

Method	XGBoost tree	Logistic regression	Random forest	Linear support vector machine	Neural network
AUROC	0.53	0.66	0.58	0.68	0.65
Sensitivity (%)	5.28	4.00	3.66	1.85	0.77
Specificity (%)	98.67	99.72	98.08	99.94	99.96
Positive predictive value (%)	22.38	51.09	22.40	56.66	60.00
Negative predictive value (%)	93.47	93.45	93.39	93.27	93.26
Diagnostic odds ratio	4.14	14.84	1.94	31.40	19.39
Positive likelihood ratio	3.96	14.35	3.97	33.10	20.61
Negative likelihood ratio	0.96	0.96	0.97	0.98	0.99

AUROC, area under receiver operating characteristic curve.

VTE risk stratification

In order to understand the VTE risks associated with the ML model rankings, patients with any top five risk factors for VTE from the LSVM model were compared to patients that did not have any of the respective comorbidities (Table 5). Patients with no top five risk factors (n=122,651) had a 90-day VTE rate of 2.95%. Patients with any one of the top five risk factors (n=16,880) had a 90-day VTE rate of 7.86%, which was associated with a 2.80 greater OR of developing a VTE compared to patients with no risk factors (P<0.001). Furthermore, patients with any two of the top five risk factors (n=1,968) had a 90-day VTE rate of 19.82%, which was associated with an 8.12 greater OR after multivariable logistic regression (P<0.001). Patients with any three of the top five risk factors (n=181) had a 90-day VTE rate of 27.62%, which was associated with a 12.54 greater OR compared to patients with no risk factors (P<0.001). Patients with four of the top five risk factors (n=17) had a 90-day VTE rate of 58.82%, which represented a 46.92 greater OR compared to patients with no risk factors (P<0.001). There were no patients within the data population that had all five risk factors.

Table 5

Risk calculator for 90-day VTE based on number of comorbidities within the top 5 per the LSVM model

No. of risk factors	No. of patients	90-day VTE rate (%)	Odds ratio	95% CI	P value
0	122,651	2.95	Reference
1	16,880	7.86	2.80	2.62–2.98	<0.001
2	1,968	19.82	8.12	7.23–9.11	<0.001
3	181	27.62	12.54	9.04–17.39	<0.001
4	17	58.82	46.92	17.85–123.33	<0.001
5	0	N/A	N/A	N/A	N/A

Top 5 risk factors included remote history of DVT/PE, chronic hypercoagulable state, metastatic cancer, hemiplegia, and chronic renal disease. VTE, venous thromboembolism; LSVM, linear support vector machine; CI, confidence interval; N/A, not available; DVT, deep vein thrombosis; PE, pulmonary embolism.

Discussion

Our study represents the largest cohort of 141,697 patients who underwent spinal fusion with posterior instrumentation (3–6 levels) analyzed with ML methodologies to create predictive models that identify and stratify key risk factors for developing VTE. Our study identified the LSVM model as having the best prediction of VTE risk with an area under the curve (AUC) value of 0.68. Within this model, five key variables were identified that constituted a simple VTE risk model with additive ORs ranging from 2.80 to 46.92 (Table 5): remote history of VTE, diagnosis of chronic hypercoagulability, metastatic cancer, hemiplegia, and chronic renal disease. It is important to note that while these variables were identified as being most closely associated with VTE in our models, other pertinent variables such as estimated blood loss, hospital length-of-stay, or operative time, were not available within the dataset.

In comparing our results to other studies, chronic renal disease has been found to be a significant risk factor for VTE after spine surgery, given the elevated inflammatory and hypercoagulable systemic state (8). The 13-point VTE risk score developed by Piper et al. via multivariable regression analysis similarly found and incorporated paraplegia/quadriplegia into their risk calculations (10). Their AUC value of 0.756 is within range of our LSVM model, although our model incorporates predictive ML methodologies and streamlines the risk calculator with a simpler 5-tiered system (Table 5). Furthermore, our models incorporate complex networks and non-linear relationships that are advantageous with large, complex datasets.

The studies by Katiyar et al., Wang et al., and Hopkins et al. represent the only other attempts to incorporate ML methodologies into VTE risk prediction after spine surgery (11-13). Katiyar et al. studied a much smaller cohort of 63 patients and utilized several models. Their Simple Logistic model had an accuracy of 84% and, like our study, incorporated prior VTE risk, amongst other variables (11). The study by Wang et al. incorporated a much larger cohort of 13,500 patients from the National Surgical Quality Improvement Program (NSQIP) database focused on 1-level lumbar fusions. They utilized a multivariable logistic regression model in addition to a similar XGBoost tree-based algorithm. They similarly identified 5 significant risk factors: age >65 years, obesity grade II or above, CAD, functional status, and prolonged operative time (13). Although they similarly found an increase in risk with each added risk factor, the differences in the nature of the risk factors between our two studies likely relate to both the difference in patient sample size as well as inherent differences in the source databases. It is also likely that differences in the procedure and differences in the patient population factor into differences in their findings with ours. Our study is also the first to formally incorporate a hypercoagulable condition into a VTE risk prediction model for spine surgery. Hopkins et al. utilized DNN with synthetic minority oversampling technique (SMOTE) to predict VTE after spine surgery at a single institution. They similarly identified a history of DVT or PE within 12 months of surgery as the top risk factor for VTE after surgery. Their best model had an AUC of 0.90, however, this was a single center study of 6,869 patients, which also utilized an oversampling technique to balance the dataset, thus potentially introducing data that can skew the model results (12).

Current rates for VTE after spine surgery published in the literature range from 0.2% to 31% (6). The retrospective cohort study by Ngan et al. studied a similarly large cohort of 121,000 patients undergoing elective lumbar surgery, finding a 30-day VTE rate of 1.1% overall (7). Our study identified an overall rate of 90-day VTE of 3.81% across our entire patient cohort of >140,000 patients. The true overall VTE rate potentially lies closer to the lower end of published ranges in a posterior spinal instrumented fusion population, although further study would certainly have to support this.

There are several limitations to this study. First, we are limited to the data available to us through the source IBM MarketScan Database. We, for example, did not have access to specific variables such as pre- or post-operative medication use or VTE mechanical or chemoprophylaxis that could have confounded the results of this study. We also were unable to cross-reference CPT code 22842 with formal fusion codes to stratify our dataset across cervical, thoracic, and lumbar procedures. However, given that we removed all trauma, infection, and malignancy diagnoses, we assumed posterior spinal instrumentation would refer to a degenerative fusion. We also were unable to cross-reference with staged anterior fusion codes. Although we assumed this was too small a subpopulation to specifically control for, this remains a limitation. Utilizing 22842 necessarily focused our study population on patients with 3–6 levels of fusion. Although not including single level non-segmental fusions may introduce some level of selection bias, this subpopulation has been previously studied and allowed us to increase our VTE capture rate and to limit variability and potential bias. Likewise, the incidence of VTE in our study is based on assigned ICD codes, which only reflects symptomatic VTE in patients with consistent follow-up. While patients were required to be continuously enrolled in the database for the full 90-day period postoperatively, it is still possible that patients did not have documented follow-up or had follow-up at another institution not included within the database. Although our dataset incorporated proxy variables for functional status (e.g., hemiplegia), it does not collect formal functional outcomes scores or variables that could be used to accurately measure functional status, which can influence post-operative VTE risk. We acknowledge that the individual risk factors associated with the 5-tiered model may not hold equal risk: for instance, remote history of VTE may have an inherently higher risk. We have provided individual ORs in Table 2 through our multivariable analysis that can act as an additional risk stratifying tool. Another limitation of our study pertains to the nature of ML. While our models are able to perform complex analyses within a large dataset, the incorporation of a large number of variables potentially introduces covariance, which may skew predictions. In order to mitigate this, our study utilizes a down-sampling technique, though due to the low rates of VTE, it is difficult to mitigate any innate bias in the training data without compromising the integrity of the data. This limitation has been consistent across several other studies, and Hopkins et al. incorporated oversampling techniques to correct for innate model training bias (12). Similarly, the overall low VTE rate, combined with the limited variables available through the source database, likely contributed to the overall low sensitivity and positive predictive value of our models. An up-sampling technique may be beneficial in future studies to better capture patients with VTE, although this too contains risks of overfitting and data noise. Finally, we limited our model to a singular dataset, which could compromise its generalizability. Further study to externally validate our model and its results is warranted.

Conclusions

Our study represents the largest cohort of >140,000 patients who underwent spinal fusion with posterior instrumentation (3–6 levels) analyzed with ML methodologies to create a simple VTE risk model that incorporates 5 key variables with additive ORs: remote history of VTE, diagnosis of chronic hypercoagulability, metastatic cancer, hemiplegia, and chronic renal disease. With current variability in timing and usage of anticoagulation after surgery, clinicians and surgeons can utilize these findings to identify patients who may benefit from more aggressive mechanical and chemoprophylactic agents.

Acknowledgments

Funding: None.

Footnote

Reporting Checklist: The authors have completed the STROBE checklist. Available at https://jss.amegroups.com/article/view/10.21037/jss-24-8/rc

Data Sharing Statement: Available at https://jss.amegroups.com/article/view/10.21037/jss-24-8/dss

Peer Review File: Available at https://jss.amegroups.com/article/view/10.21037/jss-24-8/prf

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jss.amegroups.com/article/view/10.21037/jss-24-8/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The present study utilized a publicly available database with de-identified data; therefore, institutional review board approval was not required. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Alvarado AM, Porto GBF, Wessell J, et al. Venous Thromboprophylaxis in Spine Surgery. Global Spine J 2020;10:65S-70S. [Crossref] [PubMed]
NASS Evidence-Based Guideline Development Committee. Evidence-Based Clinical Guidelines for Multidisciplinary Spine Care. Antithrombotic Therapies in Spine Surgery. 2009.
Zuckerman SL, Berven S, Streiff MB, et al. Management of Anticoagulation/Antiplatelet Medication and Venous Thromboembolism Prophylaxis in Elective Spine Surgery: Concise Clinical Recommendations Based on a Modified Delphi Process. Spine (Phila Pa 1976) 2023;48:301-9. [Crossref] [PubMed]
Recommendations from the ICM-VTE: Spine. J Bone Joint Surg Am 2022;104:309-28. [Crossref] [PubMed]
Tran KS, Issa TZ, Lee Y, et al. Impact of Prolonged Operative Duration on Postoperative Symptomatic Venous Thromboembolic Events After Thoracolumbar Spine Surgery. World Neurosurg 2023;169:e214-20. [Crossref] [PubMed]
Solaru S, Alluri RK, Wang JC, et al. Venous Thromboembolism Prophylaxis in Elective Spine Surgery. Global Spine J 2021;11:1148-55. [Crossref] [PubMed]
Ngan A, Song J, Katz AD, et al. Venous Thromboembolism Rates Have Not Decreased in Elective Lumbar Fusion Surgery from 2011 to 2020. Global Spine J 2023; Epub ahead of print. [Crossref] [PubMed]
Chen HW, Wu WT, Wang JH, et al. The Risk of Venous Thromboembolism after Thoracolumbar Spine Surgery: A Population-Based Cohort Study. J Clin Med 2023;12:613. [Crossref] [PubMed]
Massaro AM, Frier S, Strot SM, et al. Revisiting Anticoagulation in Spine Surgery: Balancing Venous Thromboembolic Events and Epidural Hematoma. Global Spine J 2023; Epub ahead of print. [Crossref] [PubMed]
Piper K, Algattas H, DeAndrea-Lazarus IA, et al. Risk factors associated with venous thromboembolism in patients undergoing spine surgery. J Neurosurg Spine 2017;26:90-6. [Crossref] [PubMed]
Katiyar P, Chase H, Lenke LG, et al. Using Machine Learning (ML) Models to Predict Risk of Venous Thromboembolism (VTE) Following Spine Surgery. Clin Spine Surg 2023;36:E453-6. [Crossref] [PubMed]
Hopkins BS, Cloney MB, Dhillon ES, et al. Using machine learning and big data for the prediction of venous thromboembolic events after spine surgery: A single-center retrospective analysis of multiple models on a cohort of 6869 patients. J Craniovertebr Junction Spine 2023;14:221-9. [Crossref] [PubMed]
Wang KY, Ikwuezunma I, Puvanesarajah V, et al. Using Predictive Modeling and Supervised Machine Learning to Identify Patients at Risk for Venous Thromboembolism Following Posterior Lumbar Fusion. Global Spine J 2023;13:1097-103. [Crossref] [PubMed]
Charlson ME, Pompei P, Ales KL, et al. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 1987;40:373-83. [Crossref] [PubMed]
Heo KY, Bonsu JM, Muffly BT, et al. Complications Rates Among Revision Total Knee Arthroplasty Patients Diagnosed With COVID-19 Postoperatively. J Arthroplasty 2024;39:766-771.e2. [Crossref] [PubMed]
Inoue T, Ichikawa D, Ueno T, et al. XGBoost, a Machine Learning Method, Predicts Neurological Recovery in Patients with Cervical Spinal Cord Injury. Neurotrauma Rep 2020;1:8-16. [Crossref] [PubMed]
Deo RC. Machine Learning in Medicine. Circulation 2015;132:1920-30. [Crossref] [PubMed]
IBM. IBM SPSS Modeler 18.3 User’s Guide. Available online: https://www.ibm.com/docs/it/SS3RA7_18.3.0/pdf/ModelerUsersGuide.pdf
Pittman B, Buta E, Krishnan-Sarin S, et al. Models for analyzing zero-inflated and overdispersed count data: an application to cigarette and marijuana use. Nicotine Tob Res 2018; Epub ahead of print. [Crossref] [PubMed]

Cite this article as: Heo KY, Rajan PV, Khawaja S, Barber LA, Yoon ST. Machine learning approach to predict venous thromboembolism among patients undergoing multi-level spinal posterior instrumented fusion. J Spine Surg 2024;10(2):214-223. doi: 10.21037/jss-24-8

Machine learning approach to predict venous thromboembolism among patients undergoing multi-level spinal posterior instrumented fusion

Highlight box

Introduction

Methods

Data source

Patient selection

Study variables and outcomes

Statistical analyses and predictive model construction

Results

Population demographics

Table 1

Multivariable analyses of comorbidities

Table 2

Predictive model parameters and assessment

Table 3

Table 4

VTE risk stratification

Table 5

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share