Gene expression microarray public dataset reanalysis in chronic obstructive pulmonary disease

Autoři: Lavida R. K. Rogers aff001;  Madison Verlinde aff002;  George I. Mias aff002
Působiště autorů: Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, United States of America aff001;  Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, United States of America aff002;  Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, United States of America aff003
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0224750


Chronic obstructive pulmonary disease (COPD) was classified by the Centers for Disease Control and Prevention in 2014 as the 3rd leading cause of death in the United States (US). The main cause of COPD is exposure to tobacco smoke and air pollutants. Problems associated with COPD include under-diagnosis of the disease and an increase in the number of smokers worldwide. The goal of our study is to identify disease variability in the gene expression profiles of COPD subjects compared to controls, by reanalyzing pre-existing, publicly available microarray expression datasets. Our inclusion criteria for microarray datasets selected for smoking status, age and sex of blood donors reported. Our datasets used Affymetrix, Agilent microarray platforms (7 datasets, 1,262 samples). We re-analyzed the curated raw microarray expression data using R packages, and used Box-Cox power transformations to normalize datasets. To identify significant differentially expressed genes we used generalized least squares models with disease state, age, sex, smoking status and study as effects that also included binary interactions, followed by likelihood ratio tests (LRT). We found 3,315 statistically significant (Storey-adjusted q-value <0.05) differentially expressed genes with respect to disease state (COPD or control). We further filtered these genes for biological effect using results from LRT q-value <0.05 and model estimates’ 10% two-tailed quantiles of mean differences between COPD and control), to identify 679 genes. Through analysis of disease, sex, age, and also smoking status and disease interactions we identified differentially expressed genes involved in a variety of immune responses and cell processes in COPD. We also trained a logistic regression model using the common array genes as features, which enabled prediction of disease status with 81.7% accuracy. Our results give potential for improving the diagnosis of COPD through blood and highlight novel gene expression disease signatures.

Klíčová slova:

Age groups – Blood – Gene expression – Human genomics – Chronic obstructive pulmonary disease – Microarrays – principal component analysis – Smoking habits


1. Mayo Clinic Staff. Bronchitis; 2019, (Accessed: 2019-06-02). Available from:

2. Mayo Clinic Staff. Emphysema; 2019, (Accessed: 2019-06-02). Available from:

3. American Lung Association. Chronic Obstructive Pulmonary Disease (COPD); 2019, (Accessed: 2019-06-02). Available from:

4. World Health Organization. Chronic Obstructive Pulmonary Disease (COPD); 2019, (Accessed: 2019-06-02). Available from:

5. World Health Organization. Chronic Obstructive Pulmonary Disease (COPD); 2017, (Accessed: 2019-06-02). Available from:

6. Centers for Disease Control and Prevention. Chronic Obstructive Pulmonary Disease (COPD); 2019, (Accessed: 2019-06-02). Available from:

7. Mirza S, Clay RD, Koslow MA, Scanlon PD. COPD Guidelines: A Review of the 2018 GOLD Report. Mayo Clinic Proceedings. 2018;93(10):1488—1502. 30286833

8. Barnes P, Burney P, Silverman E, Celli B, Vestbo J, Wedzicha J, et al. Chronic obstructive pulmonary disease. Nature Reviews Disease Primers. 2015;1. doi: 10.1038/nrdp.2015.76 27189863

9. Rabe KF W H. Chronic obstructive pulmonary disease. The lancet. 2017;389:1931–1940. doi: 10.1016/S0140-6736(17)31222-9

10. Quaderi S, Hurst J. The unmet global burden of COPD. Global health, epidemiology and genomics. 2018;3. doi: 10.1017/gheg.2018.1 29868229

11. Agusti A, MacNee W, Donaldson K, Cosio M. Hypothesis: does COPD have an autoimmune component?; 2003.

12. Rutgers SR, Postma DS, ten Hacken NH, Kauffman HF, van der Mark TW, Koëter GH, et al. Ongoing airway inflammation in patients with COPD who do not currently smoke. Thorax. 2000;55(1):12–18. doi: 10.1136/thorax.55.1.12 10607796

13. Laniado-Laborín R. Smoking and chronic obstructive pulmonary disease (COPD). Parallel epidemics of the 21st century. International journal of environmental research and public health. 2009;6(1):209–224. doi: 10.3390/ijerph6010209 19440278

14. Bahr TM, Hughes GJ, Armstrong M, Reisdorph R, Coldren CD, Edwards MG, et al. Peripheral blood mononuclear cell gene expression in chronic obstructive pulmonary disease. American journal of respiratory cell and molecular biology. 2013;49(2):316–323. doi: 10.1165/rcmb.2012-0230OC 23590301

15. Chang Y, Glass K, Liu YY, Silverman EK, Crapo JD, Tal-Singer R, et al. COPD subtypes identified by network-based clustering of blood gene expression. Genomics. 2016;107(2-3):51–58. doi: 10.1016/j.ygeno.2016.01.004 26773458

16. Reinhold D, Morrow JD, Jacobson S, Hu J, Ringel B, Seibold MA, et al. Meta-analysis of peripheral blood gene expression modules for COPD phenotypes. PloS one. 2017;12(10):e0185682. doi: 10.1371/journal.pone.0185682 29016655

17. Brooks LR, Mias GI. Data-Driven Analysis of Age, Sex, and Tissue Effects on Gene Expression Variability in Alzheimer’s Disease. Frontiers in Neuroscience. 2019;13:392. doi: 10.3389/fnins.2019.00392 31068785

18. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10. doi: 10.1093/nar/30.1.207 11752295

19. Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, et al. ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic acids research. 2003;31(1):68–71. doi: 10.1093/nar/gkg091 12519949

20. Bahr TM, Hughes GJ, Armstrong M, Reisdorph R, Coldren CD, Edwards MG, et al. Peripheral blood mononuclear cell gene expression in chronic obstructive pulmonary disease. American journal of respiratory cell and molecular biology. 2013;49(2):316–323. doi: 10.1165/rcmb.2012-0230OC 23590301

21. Fishbane N, Nie Y, Chen V, Hollander Z, Tebbutt SJ, Bossé Y, et al. The effect of statins on blood gene expression in COPD. PloS one. 2015;10(10):e0140022. doi: 10.1371/journal.pone.0140022 26462087

22. Singh D, Fox SM, Tal-Singer R, Bates S, Riley JH, Celli B. Altered gene expression in blood and sputum in COPD frequent exacerbators in the ECLIPSE cohort. PloS one. 2014;9(9):e107381. doi: 10.1371/journal.pone.0107381 25265030

23. Martin F, Talikka M, Hoeng J, Peitsch M. Identification of gene expression signature for cigarette smoke exposure response—from man to mouse. Human & experimental toxicology. 2015;34(12):1200–1211. doi: 10.1177/0960327115600364

24. Arimilli S, Madahian B, Chen P, Marano K, Prasad G. Gene expression profiles associated with cigarette smoking and moist snuff consumption. BMC genomics. 2017;18(1):156. doi: 10.1186/s12864-017-3565-1 28193179

25. Paul S, Amundson SA. Differential effect of active smoking on gene expression in male and female smokers. Journal of carcinogenesis & mutagenesis. 2014;5.

26. Mias G. Chapter 4: Databases: E-Utilities and UCSC Genome Browser. In: Mathematica for Bioinformatics: A Wolfram Language Approach to Omics. Cham: Springer International Publishing; 2018. p. 133–170.

27. Paul S, Amundson SA. Gene expression signatures of radiation exposure in peripheral white blood cells of smokers and non-smokers. International journal of radiation biology. 2011;87(8):791–801. doi: 10.3109/09553002.2011.568574 21801107

28. Wolfram Research, Inc. Mathematica; 2017. Available from:

29. R Core Team. R: A Language and Environment for Statistical Computing; 2018. Available from:

30. Gautier L, Cope L, Bolstad BM, Irizarry RA. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20(3):307–315. doi: 10.1093/bioinformatics/btg405 14960456

31. Carvalho BS, Irizarry RA. A framework for oligonucleotide microarray preprocessing. Bioinformatics. 2010;26(19):2363–2367. doi: 10.1093/bioinformatics/btq431 20688976

32. MacDonald JW. affycoretools: Functions useful for those doing repetitive analyses with Affymetrix GeneChips; 2018, (Accessed: 2019-03-30). Available from:

33. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research. 2015;43(7):e47–e47. doi: 10.1093/nar/gkv007 25605792

34. Ritchie ME, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A, et al. A comparison of background correction methods for two-colour microarrays. Bioinformatics. 2007;23(20):2700–2707. doi: 10.1093/bioinformatics/btm412 17720982

35. Mias GI, Yusufaly T, Roushangar R, Brooks LR, Singh VV, Christou C. MathIOmica: An Integrative Platform for Dynamic Omics. Sci Rep. 2016;6:37237. doi: 10.1038/srep37237 27883025

36. Sakia R. The Box-Cox transformation technique: a review. The statistician. 1992; p. 169–178. doi: 10.2307/2348250

37. Nygaard V, Rødland EA, Hovig E. Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics. 2016;17(1):29–39. doi: 10.1093/biostatistics/kxv027 26272994

38. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–127. doi: 10.1093/biostatistics/kxj037 16632515

39. Irizarry R, Love M. PH525x series—Biomedical Data Science; 2015, (Accessed: 2018-01-18). Available from:

40. Brown MB, Forsythe AB. Robust Tests for the Equality of Variances. Journal of the American Statistical Association. 1974;69(346):364–367. doi: 10.1080/01621459.1974.10482955

41. Gastwirth JL, Gel YR, Hui WLW, Lyubchich V, Miao W, Noguchi K. lawstat: Tools for Biostatistics, Public Policy, and Law; 2019. Available from:

42. Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team. nlme: Linear and Nonlinear Mixed Effects Models; 2019. Available from:

43. Storey JD. The positive false discovery rate: a Bayesian interpretation and the q -value. Ann Statist. 2003;31(6):2013–2035. doi: 10.1214/aos/1074290335

44. Storey JD, Bass AJ, Dabney A, Robinson D. qvalue: Q-value estimation for false discovery rate control; 2019. Available from:

45. Yu G, He QY. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol Biosyst. 2016;12(2):477–9. doi: 10.1039/c5mb00663e 26661513

46. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological). 1995;57(1):289–300.

47. Mias G. Chapter 9: Machine Learning. In: Mathematica for Bioinformatics: A Wolfram Language Approach to Omics. Cham: Springer International Publishing; 2018. p. 283–296.

48. Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods. 2003;31(4):282–289. 14597312

49. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27 10592173

50. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62. doi: 10.1093/nar/gkv1070 26476454

51. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45(D1):D353–D361. doi: 10.1093/nar/gkw1092 27899662

52. Koo JB, Han JS. Cigarette smoke extract-induced interleukin-6 expression is regulated by phospholipase D1 in human bronchial epithelial cells. The Journal of toxicological sciences. 2016;41(1):77–89. doi: 10.2131/jts.41.77 26763395

53. Panina-Bordignon P, Papi A, Mariani M, Di Lucia P, Casoni G, Bellettato C, et al. The CC chemokine receptors CCR4 and CCR8 identify airway T cells of allergen-challenged atopic asthmatics. The Journal of clinical investigation. 2001;107(11):1357–1364. doi: 10.1172/JCI12655 11390417

54. Reimer MK, Brange C, Rosendahl A. CCR8 signaling influences Toll-like receptor 4 responses in human macrophages in inflammatory diseases. Clin Vaccine Immunol. 2011;18(12):2050–2059. doi: 10.1128/CVI.05275-11 21976223

55. Sekine Y, Katsura H, Koh E, Hiroshima K, Fujisawa T. Early detection of COPD is important for lung cancer surveillance. European Respiratory Journal. 2012;39(5):1230–1240. doi: 10.1183/09031936.00126011 22088970

56. Shi J, Li F, Luo M, Wei J, Liu X. Distinct roles of Wnt/β-catenin signaling in the pathogenesis of chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. Mediators of inflammation. 2017;2017. doi: 10.1155/2017/3520581

57. Xu K, Moghal N, Egan SE. Notch signaling in lung development and disease. In: Notch signaling in embryology and Cancer. Springer; 2012. p. 89–98.

58. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinformatics. 2016;54:1 30 1–1 30 33.

59. Vijayan V. Chronic obstructive pulmonary disease. Indian Journal of Medical Reseaech. 2013;137:251–269.

60. Barnes PJ. Sex differences in chronic obstructive pulmonary disease mechanisms; 2016.

61. Aryal S, Diaz-Guzman E, Mannino DM. COPD and gender differences: an update. Translational Research. 2013;162(4):208–218. doi: 10.1016/j.trsl.2013.04.003 23684710

62. Mayo Clinic Staff. Chronic Obstructive Pulmonary Disease (COPD); 2019, (Accessed: 2019-06-02). Available from:

63. Cheplygina V, Pena IP, Pedersen JH, Lynch DA, Sørensen L, de Bruijne M. Transfer learning for multicenter classification of chronic obstructive pulmonary disease. IEEE journal of biomedical and health informatics. 2017;22(5):1486–1496. doi: 10.1109/JBHI.2017.2769800 29990220

64. Esteban C, Moraza J, Esteban C, Sancho F, Aburto M, Aramburu A, et al. Machine learning for COPD exacerbation prediction. European Respiratory Journal. 2015;46(suppl 59).

65. Boubacar H Amadou, Texereau J. Ensemble machine learning for the early detection of COPD exacerbations. European Respiratory Journal. 2017;50(suppl 61).

Článek vyšel v časopise


2019 Číslo 11