Discovering novel disease comorbidities using electronic medical records

Autoři: Shikha Chaganti aff001;  Valerie F. Welty aff002;  Warren Taylor aff003;  Kimberly Albert aff003;  Michelle D. Failla aff003;  Carissa Cascio aff003;  Seth Smith aff004;  Louise Mawn aff005;  Susan M. Resnick aff006;  Lori L. Beason-Held aff006;  Francesca Bagnato aff007;  Thomas Lasko aff008;  Jeffrey D. Blume aff002;  Bennett A. Landman aff001
Působiště autorů: Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America aff001;  Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America aff002;  Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff003;  Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff004;  Department of Ophthalmology and Visual Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff005;  Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, Maryland, United States of America aff006;  Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff007;  Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff008
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0225495


Increasing reliance on electronic medical records at large medical centers provides unique opportunities to perform population level analyses exploring disease progression and etiology. The massive accumulation of diagnostic, procedure, and laboratory codes in one place has enabled the exploration of co-occurring conditions, their risk factors, and potential prognostic factors. While most of the readily identifiable associations in medical records are (now) well known to the scientific community, there is no doubt many more relationships are still to be uncovered in EMR data. In this paper, we introduce a novel finding index to help with that task. This new index uses data mined from real-time PubMed abstracts to indicate the extent to which empirically discovered associations are already known (i.e., present in the scientific literature). Our methods leverage second-generation p-values, which better identify associations that are truly clinically meaningful. We illustrate our new method with three examples: Autism Spectrum Disorder, Alzheimer’s Disease, and Optic Neuritis. Our results demonstrate wide utility for identifying new associations in EMR data that have the highest priority among the complex web of correlations and causalities. Data scientists and clinicians can work together more effectively to discover novel associations that are both empirically reliable and clinically understudied.

Klíčová slova:

Alzheimer's disease – Alzheimer's disease diagnosis and management – Autism spectrum disorder – Database searching – Electronic medical records – Multiple sclerosis – Probability distribution – Vision


1. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Informatics Assoc. 2013;20: 117–121. doi: 10.1136/amiajnl-2012-001145 22955496

2. Coloma PM, Schuemie MJ, Trifirò G, Gini R, Herings R, Hippisley‐Cox J, et al. Combining electronic healthcare databases in Europe to allow for large‐scale drug safety monitoring: the EU‐ADR Project. Pharmacoepidemiol Drug Saf. 2011;20: 1–11. doi: 10.1002/pds.2053 21182150

3. Richesson RL, Hammond WE, Nahm M, Wixted D, Simon GE, Robinson JG, et al. Electronic health records based phenotyping in next-generation clinical trials: A perspective from the NIH health care systems collaboratory. J Am Med Informatics Assoc. 2013;20. doi: 10.1136/amiajnl-2013-001926 23956018

4. Ahmad NA, Kochman ML, Long WB, Furth EE, Ginsberg GG. Efficacy, safety, and clinical outcomes of endoscopic mucosal resection: a study of 101 cases. Gastrointest Endosc. 2002;55: 390–396. doi: 10.1067/mge.2002.121881 11868015

5. Kellogg TA, Swan T, Leslie DA, Buchwald H, Ikramuddin S. Patterns of readmission and reoperation within 90 days after Roux-en-Y gastric bypass. Surg Obes Relat Dis. 2009;5: 416–423. doi: 10.1016/j.soard.2009.01.008 19540169

6. Turner SD, Berg RL, Linneman JG, Peissig PL, Crawford DC, Denny JC, et al. Knowledge-driven multi-locus analysis reveals gene-gene interactions influencing HDL cholesterol level in two independent EMR-linked biobanks. PLoS One. 2011;6. doi: 10.1371/journal.pone.0019586 21589926

7. Manolio TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med. 2010;363: 166–176. doi: 10.1056/NEJMra0905980 20647212

8. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics. 2010;26: 1205–1210. doi: 10.1093/bioinformatics/btq126 20335276

9. Engels EA, Parsons R, Besson C, Morton LM, Enewold L, Ricker W, et al. Comprehensive evaluation of medical conditions associated with risk of non-Hodgkin lymphoma using Medicare claims (“MedWAS”). Cancer Epidemiol Prev Biomarkers. 2016; cebp-0212.

10. Hanauer DA, Rhodes DR, Chinnaiyan AM. Exploring clinical associations using ‘-omics’ based enrichment analyses. PLoS One. 2009;4: e5203. doi: 10.1371/journal.pone.0005203 19365550

11. Holmes AB, Hawson A, Liu F, Friedman C, Khiabanian H, Rabadan R. Discovering disease associations by integrating electronic clinical data and medical literature. PLoS One. 2011;6: e21132. doi: 10.1371/journal.pone.0021132 21731656

12. Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am J Hum Genet. 2010;86: 6–22. doi: 10.1016/j.ajhg.2009.11.017 20074509

13. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81: 559–575. doi: 10.1086/519795 17701901

14. Browning BL. PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies. BMC Bioinformatics. 2008;9: 309. doi: 10.1186/1471-2105-9-309 18620604

15. Kraft P, Zeggini E, Ioannidis JPA. Replication in genome-wide association studies. Stat Sci A Rev J Inst Math Stat. 2009;24: 561.

16. Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. Am J Hum Genet. 2010;86: 6–22. doi: 10.1016/j.ajhg.2009.11.017 20074509

17. Blume JD, McGowan LD, Dupont WD, Greevy RA Jr. Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses. PLoS One. 2018;13: e0188299. doi: 10.1371/journal.pone.0188299 29565985

18. Resnick SM, Pham DL, Kraut MA, Zonderman AB, Davatzikos C. Longitudinal magnetic resonance imaging studies of older adults: a shrinking brain. J Neurosci. 2003;23: 3295–3301. doi: 10.1523/JNEUROSCI.23-08-03295.2003 12716936

19. Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31: 1102–1111. doi: 10.1038/nbt.2749 24270849

20. Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proceedings of the AMIA Symposium. American Medical Informatics Association; 2001. p. 17.

21. Mead J, Ashwood P. Evidence supporting an altered immune response in ASD. Immunol Lett. 2015;163: 49–55. doi: 10.1016/j.imlet.2014.11.006 25448709

22. Careaga M, Rogers S, Hansen RL, Amaral DG, Van de Water J, Ashwood P. Immune endophenotypes in children with autism spectrum disorder. Biol Psychiatry. 2017;81: 434–441. doi: 10.1016/j.biopsych.2015.08.036 26493496

23. Meltzer A, Van de Water J. The role of the immune system in autism spectrum disorder. Neuropsychopharmacology. 2017;42: 284. doi: 10.1038/npp.2016.158 27534269

24. McCarthy MM, Wright CL. Convergence of sex differences and the neuroimmune system in autism spectrum disorder. Biol Psychiatry. 2017;81: 402–410. doi: 10.1016/j.biopsych.2016.10.004 27871670

25. Aldinger KA, Lehmann OJ, Hudgins L, Chizhikov VV, Bassuk AG, Ades LC, et al. FOXC1 is required for normal cerebellar development and is a major contributor to chromosome 6p25. 3 Dandy-Walker malformation. Nat Genet. 2009;41: 1037. doi: 10.1038/ng.422 19668217

26. Heppner FL, Ransohoff RM, Becher B. Immune attack: the role of inflammation in Alzheimer disease. Nat Rev Neurosci. 2015;16: 358. doi: 10.1038/nrn3880 25991443

27. Heneka MT, Carson MJ, El Khoury J, Landreth GE, Brosseron F, Feinstein DL, et al. Neuroinflammation in Alzheimer’s disease. Lancet Neurol. 2015;14: 388–405. doi: 10.1016/S1474-4422(15)70016-5 25792098

28. King E, O’Brien JT, Donaghy P, Morris C, Barnett N, Olsen K, et al. Peripheral inflammation in prodromal Alzheimer’s and Lewy body dementias. J Neurol Neurosurg Psychiatry. 2018;89: 339–345. doi: 10.1136/jnnp-2017-317134 29248892

29. Gu Y, Gutierrez J, Meier IB, Guzman VA, Manly JJ, Schupf N, et al. Circulating inflammatory biomarkers are related to cerebrovascular disease in older adults. Neurol Neuroinflammation. 2019;6: e521.

30. Group ONS. The 5-year risk of MS after optic neuritis. Experience of the optic neuritis treatment trial. Neurology. 1997;49: 1404. doi: 10.1212/wnl.49.5.1404 9371930

31. Group ONS. Multiple sclerosis risk after optic neuritis: final optic neuritis treatment trial follow-up. Arch Neurol. 2008;65: 727. doi: 10.1001/archneur.65.6.727 18541792

32. Beck RW, Cleary PA, Jye-yu C, Group ONS. The course of visual recovery after optic neuritis: experience of the Optic Neuritis Treatment Trial. Ophthalmology. 1994;101: 1771–1778. doi: 10.1016/s0161-6420(94)31103-1 7800355

33. Hood DC, Odel JG, Zhang X. Tracking the recovery of local optic nerve function after optic neuritis: a multifocal VEP study. Invest Ophthalmol Vis Sci. 2000;41: 4032–4038. 11053309

34. Reed-Jones RJ, Solis GR, Lawson KA, Loya AM, Cude-Islas D, Berger CS. Vision and falls: a multidisciplinary review of the contributions of visual impairment to falls among older adults. Maturitas. 2013;75: 22–28. doi: 10.1016/j.maturitas.2013.01.019 23434262

35. Hill K, Schwarz J. Assessment and management of falls in older people. Intern Med J. 2004;34: 557–564. doi: 10.1111/j.1445-5994.2004.00668.x 15482269

36. Bazelier MT, van Staa T, Uitdehaag BMJ, Cooper C, Leufkens HGM, Vestergaard P, et al. The risk of fracture in patients with multiple sclerosis: the UK general practice research. J Bone Miner Res. 2011;26: 2271–2279. doi: 10.1002/jbmr.418 21557309

37. Kappos L, Freedman MS, Polman CH, Edan G, Hartung H-P, Miller DH, et al. Effect of early versus delayed interferon beta-1b treatment on disability after a first clinical event suggestive of multiple sclerosis: a 3-year follow-up analysis of the BENEFIT study. Lancet. 2007;370: 389–397. doi: 10.1016/S0140-6736(07)61194-5 17679016

38. Comi G, Filippi M, Barkhof F, Durelli L, Edan G, Fernández O, et al. Effect of early interferon treatment on conversion to definite multiple sclerosis: a randomised study. Lancet. 2001;357: 1576–1582. doi: 10.1016/s0140-6736(00)04725-5 11377645

39. Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008. doi: 10.1093/nar/gkn296 18487273

40. Fiorini N, Canese K, Starchenko G, Kireev E, Kim W, Miller V, et al. Best Match: New relevance search for PubMed. PLoS Biol. 2018. doi: 10.1371/journal.pbio.2005343 30153250

Článek vyšel v časopise


2019 Číslo 11