Predicting atrial fibrillation in primary care using machine learning

Autoři: Nathan R. Hill aff001;  Daniel Ayoubkhani aff002;  Phil McEwan aff002;  Daniel M. Sugrue aff002;  Usman Farooqui aff001;  Steven Lister aff001;  Matthew Lumley aff003;  Ameet Bakhai aff004;  Alexander T. Cohen aff005;  Mark O’Neill aff006;  David Clifton aff007;  Jason Gordon aff002
Působiště autorů: Bristol-Myers Squibb Pharmaceutical Ltd, Uxbridge, United Kingdom aff001;  Health Economics and Outcomes Research Ltd, Cardiff, United Kingdom aff002;  Pfizer Ltd, Surrey, United Kingdom aff003;  Department of Cardiology, Royal Free Hospital, London, United Kingdom aff004;  Department of Haematological Medicine, Guys and St Thomas' NHS Foundation Trust, King's College London, London, United Kingdom aff005;  Division of Cardiovascular Medicine, Guys and St Thomas' NHS Foundation Trust, King's College London, London, United Kingdom aff006;  Department of Engineering Science, University of Oxford, Oxford, United Kingdom aff007
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0224582



Atrial fibrillation (AF) is the most common sustained heart arrhythmia. However, as many cases are asymptomatic, a large proportion of patients remain undiagnosed until serious complications arise. Efficient, cost-effective detection of the undiagnosed may be supported by risk-prediction models relating patient factors to AF risk. However, there exists a need for an implementable risk model that is contemporaneous and informed by routinely collected patient data, reflecting the real-world pathology of AF.


This study sought to develop and evaluate novel and conventional statistical and machine learning models for risk-predication of AF. This was a retrospective, cohort study of adults (aged ≥30 years) without a history of AF, listed on the Clinical Practice Research Datalink, from January 2006 to December 2016. Models evaluated included published risk models (Framingham, ARIC, CHARGE-AF), machine learning models, which evaluated baseline and time-updated information (neural network, LASSO, random forests, support vector machines), and Cox regression.


Analysis of 2,994,837 individuals (3.2% AF) identified time-varying neural networks as the optimal model achieving an AUROC of 0.827 vs. 0.725, with number needed to screen of 9 vs. 13 patients at 75% sensitivity, when compared with the best existing model CHARGE-AF. The optimal model confirmed known baseline risk factors (age, previous cardiovascular disease, antihypertensive medication usage) and identified additional time-varying predictors (proximity of cardiovascular events, body mass index (both levels and changes), pulse pressure, and the frequency of blood pressure measurements).


The optimal time-varying machine learning model exhibited greater predictive performance than existing AF risk models and reflected known and new patient risk factors for AF.

Klíčová slova:

Atrial fibrillation – Blood pressure – Heart failure – Hypertension – Machine learning – Medical risk factors – Neural networks – Primary care


1. Atrial Fibrillation Association Anticoagulation Europe (UK). The AF Report—Atrial Fibrillation: Preventing A Stroke Crisis. 2012. Available from:

2. Wolf P, Abbott R, Kannel W. Atrial fibrillation as an independent risk factor for stroke: the Framingham Study. Stroke. 1991;22(8):983–8. doi: 10.1161/01.str.22.8.983 1866765

3. Lamassa M, Di Carlo A, Pracucci G, Basile AM, Trefoloni G, Vanni P, et al. Characteristics, outcome, and care of stroke associated with atrial fibrillation in Europe: data from a multicenter multinational hospital-based registry (The European Community Stroke Project). Stroke. 2001;32(2):392–8. Epub 2001/02/07. doi: 10.1161/01.str.32.2.392 11157172.

4. Marini C, De Santis F, Sacco S, Russo T, Olivieri L, Totaro R, et al. Contribution of atrial fibrillation to incidence and outcome of ischemic stroke: results from a population-based study. Stroke. 2005;36(6):1115–9. doi: 10.1161/01.STR.0000166053.83476.4a 15879330

5. Vos T, Abajobir AA, Abate KH, Abbafati C, Abbas KM, Abd-Allah F, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. The Lancet. 2017;390(10100):1211–59. doi: 10.1016/s0140-6736(17)32154-2 28919117

6. Public Health England. Atrial fibrillation prevalence estimates in England: Application of recent population estimates of AF in Sweden. 2017. Available from:

7. Kirchhof P, Benussi S, Kotecha D, Ahlsson A, Atar D, Casadei B, et al. 2016 ESC Guidelines for the management of atrial fibrillation developed in collaboration with EACTS. Eur Heart J. 2016;37(38):2893–962. Epub 2016/08/28. doi: 10.1093/eurheartj/ehw210 27567408.

8. January CT, Wann LS, Alpert JS, Calkins H, Cigarroa JE, Cleveland JC Jr., et al. 2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the Heart Rhythm Society. J Am Coll Cardiol. 2014;64(21):e1–76. Epub 2014/04/02. doi: 10.1016/j.jacc.2014.03.022 24685669.

9. Hobbs FD, Fitzmaurice DA, Mant J, Murray E, Jowett S, Bryan S, et al. A randomised controlled trial and cost-effectiveness study of systematic screening (targeted and total population screening) versus routine practice for the detection of atrial fibrillation in people aged 65 and over. The SAFE study. Health Technol Assess. 2005;9(40):iii-iv, ix-x, 1–74. Epub 2005/10/06. 16202350.

10. Taggar JS, Coleman T, Lewis S, Heneghan C, Jones M. Accuracy of methods for detecting an irregular pulse and suspected atrial fibrillation: a systematic review and meta-analysis. European journal of preventive cardiology. 2016;23(12):1330–8. doi: 10.1177/2047487315611347 26464292

11. Alonso A, Krijthe BP, Aspelund T, Stepas KA, Pencina MJ, Moser CB, et al. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. Journal of the American Heart Association. 2013;2(2):e000102. doi: 10.1161/JAHA.112.000102 23537808

12. Chamberlain AM, Agarwal SK, Folsom AR, Soliman EZ, Chambless LE, Crow R, et al. A clinical risk score for atrial fibrillation in a biracial prospective cohort (from the Atherosclerosis Risk in Communities [ARIC] study). American Journal of Cardiology. 2011;107(1):85–91. doi: 10.1016/j.amjcard.2010.08.049 21146692

13. Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D'Agostino Sr RB, et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. The Lancet. 2009;373(9665):739–45.

14. University of Nottingham. GRASP-AF. 2015. Available from:

15. Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20):1920–30. Epub 2015/11/18. doi: 10.1161/CIRCULATIONAHA.115.001593 26572668; PubMed Central PMCID: PMC5831252.

16. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). International journal of epidemiology. 2015;44(3):827–36. Epub 2015/06/08. doi: 10.1093/ije/dyv098 26050254; PubMed Central PMCID: PMC4521131.

17. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). 1996:267–88.

18. Breiman L. Random forests. Machine learning. 2001;45(1):5–32.

19. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20(3):273–97.

20. Ripley B. Pattern Recognition and Neural Networks Cambridge University Press, Cambridge; 1996.

21. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. 2011. Available from:

22. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. New York: Springer; 2001.

23. Garson GD. Interpreting neural-network connection weights. AI expert. 1991;6(4):46–51.

24. Greenwell BM. pdp: An R Package for Constructing Partial Dependence Plots. R Journal. 2017;9(1).

25. Mairesse GH, Moran P, Van Gelder IC, Elsner C, Rosenqvist M, Mant J, et al. Screening for atrial fibrillation: a European Heart Rhythm Association (EHRA) consensus document endorsed by the Heart Rhythm Society (HRS), Asia Pacific Heart Rhythm Society (APHRS), and Sociedad Latinoamericana de Estimulación Cardíaca y Electrofisiología (SOLAECE). Ep Europace. 2017;19(10):1589–623. doi: 10.1093/europace/eux177 29048522

26. Cohen J. Statistical power analysis for the behavioral sciences. 2 ed: Hillsdale, N.J.: Lawrence Erlbaum; 1988.

27. Mitchell GF, Vasan RS, Keyes MJ, Parise H, Wang TJ, Larson MG, et al. Pulse pressure and risk of new-onset atrial fibrillation. Jama. 2007;297(7):709–15. Epub 2007/02/22. doi: 10.1001/jama.297.7.709 17312290.

28. Rosenbaum PR. Observational studies. Observational studies: Springer; 2002. p. 1–17.

29. LeCun Y, Bengio Y, Hinton G. Deep learning. nature. 2015;521(7553):436. doi: 10.1038/nature14539 26017442

30. XGBoost. XGBoost Documentation. 2019. Available from:

31. Merali ZG, Witiw CD, Badhiwala JH, Wilson JR, Fehlings MG. Using a machine learning approach to predict outcome after surgery for degenerative cervical myelopathy. PLoS One. 2019;14(4):e0215133. Epub 2019/04/05. doi: 10.1371/journal.pone.0215133 30947300; PubMed Central PMCID: PMC6448910.

32. Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One. 2018;13(8):e0202344. Epub 2018/09/01. doi: 10.1371/journal.pone.0202344 30169498; PubMed Central PMCID: PMC6118376.

33. Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Ayala Solares R, Raimondi F, et al. Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records. PLoS medicine. 2018;15(11):e1002695. Epub 2018/11/21. doi: 10.1371/journal.pmed.1002695 30458006; PubMed Central PMCID: PMC6245681 following competing interests: JT receives funding for DPhil provided by Rhodes Trust and Clarendon Fund, is Chair on board of CHASE (incorporated association), travel grant from European Society of Hypertension, British Research Council training grant, Special Consultant for Bendelta. KR receives a stipend as a specialty consulting editor for PLOS Medicine and serves on the journal's editorial board.

34. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017;12(4):e0174944. Epub 2017/04/05. doi: 10.1371/journal.pone.0174944 28376093; PubMed Central PMCID: PMC5380334.

35. McDonald L, Schultze A, Carroll R, Ramagopalan SV. Performing studies using the UK Clinical Practice Research Datalink: to link or not to link? Eur J Epidemiol. 2018. Epub 2018/04/06. doi: 10.1007/s10654-018-0389-5 29619668.

36. Morley KI, Wallace J, Denaxas SC, Hunter RJ, Patel RS, Perel P, et al. Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation. PLoS One. 2014;9(11):e110900. Epub 2014/11/05. doi: 10.1371/journal.pone.0110900 25369203; PubMed Central PMCID: PMC4219705.

Článek vyšel v časopise


2019 Číslo 11