Can machine learning improve patient selection for cardiac resynchronization therapy?

Autoři: Szu-Yeu Hu aff001;  Enrico Santus aff002;  Alexander W. Forsyth aff002;  Devvrat Malhotra aff003;  Josh Haimson aff002;  Neal A. Chatterjee aff004;  Daniel B. Kramer aff005;  Regina Barzilay aff002;  James A. Tulsky aff006;  Charlotta Lindvall aff006
Působiště autorů: Department of Radiology, Masachusetts General Hospital, Boston, Massachusetts, United States of America aff001;  Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, Massachusetts, United States of America aff002;  Department of Health Policy and Management, Harvard School of Public Health, Boston, Massachusetts, United States of America aff003;  Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, United States of America aff004;  Richard A. and Susan F. Smith Center for Outcomes Research, Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America aff005;  Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America aff006;  Division of Palliative Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America aff007
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0222397



Multiple clinical trials support the effectiveness of cardiac resynchronization therapy (CRT); however, optimal patient selection remains challenging due to substantial treatment heterogeneity among patients who meet the clinical practice guidelines.


To apply machine learning to create an algorithm that predicts CRT outcome using electronic health record (EHR) data avaible before the procedure.

Methods and results

We applied machine learning and natural language processing to the EHR of 990 patients who received CRT at two academic hospitals between 2004–2015. The primary outcome was reduced CRT benefit, defined as <0% improvement in left ventricular ejection fraction (LVEF) 6–18 months post-procedure or death by 18 months. Data regarding demographics, laboratory values, medications, clinical characteristics, and past health services utilization were extracted from the EHR available before the CRT procedure. Bigrams (i.e., two-word sequences) were also extracted from the clinical notes using natural language processing. Patients accrued on average 75 clinical notes (SD, 29) before the procedure including data not captured anywhere else in the EHR. A machine learning model was built using 80% of the patient sample (training and validation dataset), and tested on a held-out 20% patient sample (test dataset). Among 990 patients receiving CRT the mean age was 71.6 (SD, 11.8), 78.1% were male, 87.2% non-Hispanic white, and the mean baseline LVEF was 24.8% (SD, 7.69). Out of 990 patients, 403 (40.7%) were identified as having a reduced benefit from the CRT device (<0% LVEF improvement in 25.2%, death by 18 months in 15.6%). The final model identified 26% of these patients at a positive predictive value of 79% (model performance: Fβ (β = 0.1): 77%; recall 0.26; precision 0.79; accuracy 0.65).


A machine learning model that leveraged readily available EHR data and clinical notes identified a subset of CRT patients who may not benefit from CRT before the procedure.

Klíčová slova:

Cardiology – Coronary heart disease – Heart failure – Machine learning – Medical devices and equipment – Medical implants – Natural language processing – Semantics


Článek vyšel v časopise


2019 Číslo 10

