Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters

Autoři: Maximilian König aff001;  André Sander aff003;  Ilja Demuth aff001;  Daniel Diekmann aff003;  Elisabeth Steinhagen-Thiessen aff001
Působiště autorů: Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Lipid Clinic at Interdisciplinary Metabolism Center, Berlin, Germany aff001;  Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Department of Nephrology and Internal Intensive Care Medicine Berlin, Germany aff002;  ID Information und Dokumentation im Gesundheitswesen GmbH, Berlin, Germany aff003;  Charité - Universitätsmedizin Berlin, BCRT—Berlin Institute of Health Center for Regenerative Therapies, Berlin, Germany aff004
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0224916



The secondary use of medical data contained in electronic medical records, such as hospital discharge letters, is a valuable resource for the improvement of clinical care (e.g. in terms of medication safety) or for research purposes. However, the automated processing and analysis of medical free text still poses a huge challenge to available natural language processing (NLP) systems. The aim of this study was to implement a knowledge-based best of breed approach, combining a terminology server with integrated ontology, a NLP pipeline and a rules engine.


We tested the performance of this approach in a use case. The clinical event of interest was the particular drug-disease interaction “proton-pump inhibitor [PPI] use and osteoporosis”. Cases were to be identified based on free text digital discharge letters as source of information. Automated detection was validated against a gold standard.


Precision of recognition of osteoporosis was 94.19%, and recall was 97.45%. PPIs were detected with 100% precision and 97.97% recall. The F-score for the detection of the given drug-disease-interaction was 96,13%.


We could show that our approach of combining a NLP pipeline, a terminology server, and a rules engine for the purpose of automated detection of clinical events such as drug-disease interactions from free text digital hospital discharge letters was effective. There is huge potential for the implementation in clinical and research contexts, as this approach enables analyses of very high numbers of medical free text documents within a short time period.

Klíčová slova:

Engines – Language – Medicine and health sciences – Natural language processing – Osteoporosis – Syntax – Osteopenia and osteoporosis


1. Ammenwerth E, Neubert A, Criegee-Rieck M. Arzneimitteltherapiesicherheit und IT: Der Weg zu neuen Ufern. Dtsch Arztebl International. 2014;111(26):1195–.

2. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008:128–44. 18660887.

3. Yu H, Hatzivassiloglou V, Friedman C, Rzhetsky A, Wilbur WJ. Automatic extraction of gene and protein synonyms from MEDLINE and journal articles. Proc AMIA Symp. 2002:919–23. 12463959; PubMed Central PMCID: PMC2244511.

4. Toepfer M, Corovic H, Fette G, Klügl P, Störk S, Puppe F. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak. 2015;15:91. Epub 2015/11/12. doi: 10.1186/s12911-015-0215-x 26563260; PubMed Central PMCID: PMC4643516.

5. Trick WE, Chapman WW, Wisniewski MF, Peterson BJ, Solomon SL, Weinstein RA. Electronic interpretation of chest radiograph reports to detect central venous catheters. Infect Control Hosp Epidemiol. 2003;24(12):950–4. doi: 10.1086/502165 14700412.

6. Iqbal E, Mallah R, Jackson RG, Ball M, Ibrahim ZM, Broadbent M, et al. Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register. PLoS One. 2015;10(8):e0134208. Epub 2015/08/14. doi: 10.1371/journal.pone.0134208 26273830; PubMed Central PMCID: PMC4537312.

7. Iqbal E, Mallah R, Rhodes D, Wu H, Romero A, Chang N, et al. ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records. PLoS One. 2017;12(11):e0187121. Epub 2017/11/09. doi: 10.1371/journal.pone.0187121 29121053; PubMed Central PMCID: PMC5679515.

8. Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. MediClass: A system for detecting and classifying encounter-based clinical events in any electronic medical record. J Am Med Inform Assoc. 2005;12(5):517–29. Epub 2005/05/19. doi: 10.1197/jamia.M1771 15905485; PubMed Central PMCID: PMC1205600.

9. Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12(4):448–57. Epub 2005/03/31. doi: 10.1197/jamia.M1794 15802475; PubMed Central PMCID: PMC1174890.

10. Cui L, Bozorgi A, Lhatoo SD, Zhang GQ, Sahoo SS. EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. AMIA Annu Symp Proc. 2012;2012:1191–200. Epub 2012/11/03. 23304396; PubMed Central PMCID: PMC3540531.

11. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13. doi: 10.1136/jamia.2009.001560 20819853; PubMed Central PMCID: PMC2995668.

12. Hahn U, Matthies F, Lohr C, Löffler M. 3000PA-Towards a National Reference Corpus of German Clinical Language. Stud Health Technol Inform. 2018;247:26–30. 29677916.

13. Richter-Pechanski P, Riezler S, Dieterich C. De-Identification of German Medical Admission Notes. Stud Health Technol Inform. 2018;253:165–9. 30147065.

14. Löpprich M, Krauss F, Ganzinger M, Senghas K, Riezler S, Knaup P. Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research. Methods Inf Med. 2016;55(4):373–80. Epub 2016/07/13. doi: 10.3414/ME15-02-0019 27406024.

15. Lohr C., Buechel S., Hahn U. Sharing Copies of Synthetic Clinical Corpora without Physical Distribution—A Case Study to Get Around IPRs and Privacy Constraints Featuring the German JSYNCC Corpus. LREC (2018).

16. Kolditz T, Lohr C, Hellrich J, Modersohn L, Betz B, Kiehntopf M, et al. Annotating German Clinical Documents for De-Identification. Stud Health Technol Inform. 2019;264:203–7. doi: 10.3233/SHTI190212 31437914.

17. Jungmann F, Kuhn S, Tsaur I, Kämpgen B. [Natural language processing in radiology: Neither trivial nor impossible]. Radiologe. 2019;59(9):828–32. doi: 10.1007/s00117-019-0555-0 31168771.

18. Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc. 2013;20(5):828–35. Epub 2013/04/09. doi: 10.1136/amiajnl-2013-001635 23571849; PubMed Central PMCID: PMC3756274.

19. Sander A, Wauer R. Integrating terminologies into standard SQL: a new approach for research on routine data. J Biomed Semantics. 2019;10(1):7. Epub 2019/04/24. doi: 10.1186/s13326-019-0199-z 31014403; PubMed Central PMCID: PMC6480592.

20. Riedl B, Than N, Hogarth M. Using the UMLS and Simple Statistical Methods to Semantically Categorize Causes of Death on Death Certificates. AMIA Annu Symp Proc. 2010;2010:677–81. Epub 2010/11/13. 21347064; PubMed Central PMCID: PMC3041359.

21. Davis K, Staes C, Duncan J, Igo S, Facelli JC. Identification of pneumonia and influenza deaths using the Death Certificate Pipeline. BMC Med Inform Decis Mak. 2012;12:37. Epub 2012/05/08. doi: 10.1186/1472-6947-12-37 22569097; PubMed Central PMCID: PMC3444937.

22. Rea S, Pathak J, Savova G, Oniki TA, Westberg L, Beebe CE, et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform. 2012;45(4):763–71. Epub 2012/02/04. doi: 10.1016/j.jbi.2012.01.009 22326800; PubMed Central PMCID: PMC4905766.

23. Maes ML, Fixen DR, Linnebur SA. Adverse effects of proton-pump inhibitor use in older adults: a review of the evidence. Ther Adv Drug Saf. 2017;8(9):273–97. Epub 2017/06/29. doi: 10.1177/2042098617715381 28861211; PubMed Central PMCID: PMC5557164.

24. Mössner J. The Indications, Applications, and Risks of Proton Pump Inhibitors. Dtsch Arztebl Int. 2016;113(27–28):477–83. doi: 10.3238/arztebl.2016.0477 27476707; PubMed Central PMCID: PMC4973002.

25. Othman F, Card TR, Crooks CJ. Proton pump inhibitor prescribing patterns in the UK: a primary care database study. Pharmacoepidemiol Drug Saf. 2016;25(9):1079–87. Epub 2016/06/03. doi: 10.1002/pds.4043 27255671.

26. George CJ, Korc B, Ross JS. Appropriate proton pump inhibitor use among older adults: a retrospective chart review. Am J Geriatr Pharmacother. 2008;6(5):249–54. doi: 10.1016/j.amjopharm.2008.12.001 19161927.

27. Sheikh-Taha M, Dimassi H. Potentially inappropriate home medications among older patients with cardiovascular disease admitted to a cardiology service in USA. BMC Cardiovasc Disord. 2017;17(1):189. Epub 2017/07/17. doi: 10.1186/s12872-017-0623-1 28716041; PubMed Central PMCID: PMC5514488.

28. Panel BtAGSBCUE. American Geriatrics Society 2015 Updated Beers Criteria for Potentially Inappropriate Medication Use in Older Adults. J Am Geriatr Soc. 2015;63(11):2227–46. Epub 2015/10/08. doi: 10.1111/jgs.13702 26446832.

29. Khalili H, Huang ES, Jacobson BC, Camargo CA, Feskanich D, Chan AT. Use of proton pump inhibitors and risk of hip fracture in relation to dietary and lifestyle factors: a prospective cohort study. BMJ. 2012;344:e372. Epub 2012/01/30. doi: 10.1136/bmj.e372 22294756; PubMed Central PMCID: PMC3269660.

30. Bertram L, Böckenhoff A, Demuth I, Düzel S, Eckardt R, Li SC, et al. Cohort profile: The Berlin Aging Study II (BASE-II). Int J Epidemiol. 2014;43(3):703–12. doi: 10.1093/ije/dyt018 23505255.

31. Gerstorf D, Bertram L, Lindenberger U, Pawelec G, Demuth I, Steinhagen-Thiessen E, et al. Editorial. Gerontology. 2016;62(3):311–5. doi: 10.1159/000441495 26820471.

32. Kayaalp M. Modes of De-identification. AMIA Annu Symp Proc. 2017;2017:1044–50. Epub 2018/04/16. 29854172; PubMed Central PMCID: PMC5977668.


34. GATE: an Architecture for Development of Robust HLT applications Hamish Cunningham author Diana Maynard author Kalina Bontcheva author Valentin Tablan author 2002-jul text Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics Association for Computational Linguistics Philadelphia, Pennsylvania, USA conference publication cunningham-etal-2002-gate 10.3115/1073083.1073112 2002-jul 168 175

35. Pryor TA, Hripcsak G. The Arden syntax for medical logic modules. Int J Clin Monit Comput. 1993;10(4):215–24. doi: 10.1007/bf01133012 8270835.


37. Wingert F. Automated Indexing of SNOMED Statements into ICD. Methods of Information in Medecine. 1987;26:93–8.

38. Wingert F. An indexing system for SNOMED. Methods Inf Med. 1986;25(1):22–30. 3753739.

39. Available from:

40. Liu S, Tang B, Chen Q, Wang X. Drug-Drug Interaction Extraction via Convolutional Neural Networks. Comput Math Methods Med. 2016;2016:6918381. Epub 2016/01/31. doi: 10.1155/2016/6918381 26941831; PubMed Central PMCID: PMC4752975.

41. Huynh T, He Y, Willis AaR, Stefan. Adverse Drug Reaction Classification With Deep Neural Networks. Proceedings of COLING. 2016:877–87.

42. Bache R, Miles S, Taweel A. An adaptable architecture for patient cohort identification from diverse data sources. J Am Med Inform Assoc. 2013;20(e2):e327–33. Epub 2013/09/24. doi: 10.1136/amiajnl-2013-001858 24064442; PubMed Central PMCID: PMC3861920.

43. Ferrisa T, A., Podchiyskaa T. Cohort Discovery Query Optimization via Computable Controlled Vocabulary Versioning. 2015. In: MEDINFO 2015: eHealth-enabled health [Internet]. IOS Press.

44. Amancio DR, Silva FN, Costa LdF. Concentric network symmetry grasps authors' styles in word adjacency networks.

45. Yang L, Zhiyuan L, Tat-Seng C, Maosong S. Topical Word Embeddings.

Článek vyšel v časopise


2019 Číslo 11