Extracting lung function measurements to enhance phenotyping of chronic obstructive pulmonary disease (COPD) in an electronic health record using automated tools

Autoři: Kathleen M. Akgün aff001;  Keith Sigel aff003;  Kei-Hoi Cheung aff001;  Farah Kidwai-Khan aff001;  Alex K. Bryant aff005;  Cynthia Brandt aff001;  Amy Justice aff001;  Kristina Crothers aff006
Působiště autorů: Department of Medicine, VA Connecticut Healthcare System, West Haven, CT, United States of America aff001;  Department of Medicine, Yale University School of Medicine, New Haven, CT, United States of America aff002;  Division of General Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America aff003;  Department of Emergency Medicine, Yale University School of Medicine, New Haven, Connecticut, United States of America aff004;  Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan, United States of America aff005;  Department of Medicine, VA Puget Sound Health Care System and University of Washington, Seattle, Washington, United States of America aff006;  Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States of America aff007
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pone.0227730



Chronic obstructive pulmonary disease (COPD) is associated with poor quality of life, hospitalization and mortality. COPD phenotype includes using pulmonary function tests to determine airflow obstruction from the forced expiratory volume in one second (FEV1):forced vital capacity. FEV1 is a commonly used value for severity but is difficult to identify in structured electronic health record (EHR) data.

Data source and methods

Using the Microsoft SQL Server’s full-text search feature and string functions supporting regular-expression-like operations, we developed an automated tool to extract FEV1 values from progress notes to improve ascertainment of FEV1 in EHR in the Veterans Aging Cohort Study (VACS).


The automated tool increased quantifiable FEV1 values from 12,425 to 16,274 (24% increase in numeric FEV1). Using chart review as the reference, positive predictive value of the tool was 99% (95% Confidence interval: 98.2–100.0%) for identifying quantifiable FEV1 values and a recall value of 100%, yielding an F-measure of 0.99. The tool correctly identified FEV1 measurements in 95% of cases.


A SQL-based full text search of clinical notes for quantifiable FEV1 is efficient and improves the number of values available in VA data. Future work will examine how these methods can improve phenotyping of patients with COPD in the VA.

Klíčová slova:

Asthma – Electronic medical records – Equipment – Charts – Chronic obstructive pulmonary disease – Phenotypes – Pulmonary function – Disease informatics


1. Johannessen A, Nilsen RM, Storebo M, Gulsvik A, Eagan T, Bakke P. Comparison of 2011 and 2007 Global Initiative for Chronic Obstructive Lung Disease guidelines for predicting mortality and hospitalization. Am J Respir Crit Care Med. 2013;188(1):51–9. doi: 10.1164/rccm.201212-2276OC 23590268.

2. Vestbo J, Edwards LD, Scanlon PD, Yates JC, Agusti A, Bakke P, et al. Changes in forced expiratory volume in 1 second over time in COPD. N Engl J Med. 2011;365(13):1184–92. doi: 10.1056/NEJMoa1105482 21991892.

3. Sauer BC, Jones BE, Globe G, Leng J, Lu CC, He T, et al. Performance of a Natural Language Processing (NLP) Tool to Extract Pulmonary Function Test (PFT) Reports from Structured and Semistructured Veteran Affairs (VA) Data. EGEMS (Wash DC). 2016;4(1):1217. Epub 2016/07/05. doi: 10.13063/2327-9214.1217 27376095; PubMed Central PMCID: PMC4909376.

4. Wi CI, Sohn S, Ali M, Krusemark E, Ryu E, Liu H, et al. Natural Language Processing for Asthma Ascertainment in Different Practice Settings. J Allergy Clin Immunol Pract. 2018;6(1):126–31. Epub 2017/06/22. doi: 10.1016/j.jaip.2017.04.041 28634104; PubMed Central PMCID: PMC5733699.

5. Brown SH, Lincoln MJ, Groen PJ, Kolodner RM. VistA—U.S. Department of Veterans Affairs national-scale HIS. Int J Med Inform. 2003;69(2–3):135–56. Epub 2003/06/18. doi: 10.1016/s1386-5056(02)00131-4 12810119.

6. Hinchcliff M, Just E, Podlusky S, Varga J, Chang RW, Kibbe WA. Text data extraction for a prospective, research-focused data mart: implementation and validation. BMC Med Inform Decis Mak. 2012;12:106. doi: 10.1186/1472-6947-12-106 22970696; PubMed Central PMCID: PMC3537747.

7. Garla V, Lo Re V 3rd, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, et al. The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc. 2011;18(5):614–20. doi: 10.1136/amiajnl-2011-000093 21622934; PubMed Central PMCID: PMC3168305.

8. Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, et al. CLAMP—a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2017. doi: 10.1093/jamia/ocx132 29186491.

9. Turchin A, Kolatkar NS, Grant RW, Makhni EC, Pendergrass ML, Einbinder JS. Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes. J Am Med Inform Assoc. 2006;13(6):691–5. doi: 10.1197/jamia.M2078 16929043; PubMed Central PMCID: PMC1656954.

10. Hripcsak G, Rothschild AS. Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc. 2005;12(3):296–8. Epub 2005/02/03. doi: 10.1197/jamia.M1733 15684123; PubMed Central PMCID: PMC1090460.

11. Crothers K, Rodriguez CV, Nance RM, Akgun K, Shahrir S, Kim J, et al. Accuracy of electronic health record data for the diagnosis of chronic obstructive pulmonary disease in persons living with HIV and uninfected persons. Pharmacoepidemiol Drug Saf. 2019;28(2):140–7. Epub 2018/06/21. doi: 10.1002/pds.4567 29923258; PubMed Central PMCID: PMC6309326.

12. Prieto-Centurion V, Rolle AJ, Au DH, Carson SS, Henderson AG, Lee TA, et al. Multicenter study comparing case definitions used to identify patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2014;190(9):989–95. doi: 10.1164/rccm.201406-1166OC 25192554; PubMed Central PMCID: PMC4299587.

13. Ferguson GT, Enright PL, Buist AS, Higgins MW. Office spirometry for lung health assessment in adults: A consensus statement from the National Lung Health Education Program. Chest. 2000;117(4):1146–61. doi: 10.1378/chest.117.4.1146 10767253.

14. Joo MJ, Sharp LK, Au DH, Lee TA, Fitzgibbon ML. Use of spirometry in the diagnosis of COPD: a qualitative study in primary care. COPD. 2013;10(4):444–9. doi: 10.3109/15412555.2013.766683 23537230; PubMed Central PMCID: PMC3938329.

15. Wu H, Wise RA, Medinger AE. Do Patients Hospitalized With COPD Have Airflow Obstruction? Chest. 2017;151(6):1263–71. doi: 10.1016/j.chest.2017.01.003 28089815.

16. Qaseem A, Wilt TJ, Weinberger SE, Hanania NA, Criner G, van der Molen T, et al. Diagnosis and management of stable chronic obstructive pulmonary disease: a clinical practice guideline update from the American College of Physicians, American College of Chest Physicians, American Thoracic Society, and European Respiratory Society. Ann Intern Med. 2011;155(3):179–91. doi: 10.7326/0003-4819-155-3-201108020-00008 21810710.

17. Bodduluri S, Reinhardt JM, Hoffman EA, Newell JD Jr., Bhatt SP. Recent Advances in Computed Tomography Imaging in Chronic Obstructive Pulmonary Disease. Ann Am Thorac Soc. 2018;15(3):281–9. doi: 10.1513/AnnalsATS.201705-377FR 28812906; PubMed Central PMCID: PMC5880521.

18. de Vries R, Dagelet YWF, Spoor P, Snoey E, Jak PMC, Brinkman P, et al. Clinical and inflammatory phenotyping by breathomics in chronic airway diseases irrespective of the diagnostic label. Eur Respir J. 2018;51(1). doi: 10.1183/13993003.01817–2017 29326334.

Článek vyšel v časopise


2020 Číslo 1
Nejčtenější tento týden