A novel model for malaria prediction based on ensemble algorithms


Autoři: Mengyang Wang aff001;  Hui Wang aff001;  Jiao Wang aff001;  Hongwei Liu aff001;  Rui Lu aff001;  Tongqing Duan aff001;  Xiaowen Gong aff001;  Siyuan Feng aff001;  Yuanyuan Liu aff001;  Zhuang Cui aff001;  Changping Li aff001;  Jun Ma aff001
Působiště autorů: Department of Health Statistics, College of Public Health, Tianjin Medical University, Heping District, Tianjin, P.R. China aff001
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: 10.1371/journal.pone.0226910

Souhrn

Background and objective

Most previous studies adopted single traditional time series models to predict incidences of malaria. A single model cannot effectively capture all the properties of the data structure. However, a stacking architecture can solve this problem by combining distinct algorithms and models. This study compares the performance of traditional time series models and deep learning algorithms in malaria case prediction and explores the application value of stacking methods in the field of infectious disease prediction.

Methods

The ARIMA, STL+ARIMA, BP-ANN and LSTM network models were separately applied in simulations using malaria data and meteorological data in Yunnan Province from 2011 to 2017. We compared the predictive performance of each model through evaluation measures: RMSE, MASE, MAD. In addition, gradient-boosting regression trees (GBRTs) were used to combine the above four models. We also determined whether stacking structure improved the model prediction performance.

Results

The root mean square errors (RMSEs) of the four sub-models were 13.176, 14.543, 9.571 and 7.208; the mean absolute scaled errors (MASEs) were 0.469, 0.472, 0.296 and 0.266 and the mean absolute deviation (MAD) were 6.403, 7.658, 5.871 and 5.691. After using the stacking architecture combined with the above four models, the RMSE, MASE and MAD values of the ensemble model decreased to 6.810, 0.224 and 4.625, respectively.

Conclusions

A novel ensemble model based on the robustness of structured prediction and model combination through stacking was developed. The findings suggest that the predictive performance of the final model is superior to that of the other four sub-models, indicating that stacking architecture may have significant implications in infectious disease prediction.

Klíčová slova:

Algorithms – Deep learning – China – Infectious diseases – Machine learning algorithms – Malaria – Neural networks


Zdroje

1. World Health Organization. Malaria 2018. Available at: http://www.who.int/news-room/fact-sheets/detail/malaria (Accessed 26 November 2018)

2. Organization W H. WHO Global Technical Strategy for Malaria 2016–2030. Computer Physics Communications 2015; 48(48):145–147.

3. Craig MH, Kleinschmidt I, Nawn JB, Le Sueur D, Sharp BL. Exploring 30 years of malaria case data in KwaZulu-Natal, South Africa: part I. The impact of climatic factors. Trop Med Int Health. 2004;9: 1247–57. doi: 10.1111/j.1365-3156.2004.01340.x 15598256

4. Teklehaimanot H, Lipsitch M, Teklehaimanot A, Schwartz J. Weatherbased prediction of Plasmodium falciparum malaria in epidemic-prone regions of Ethiopia I. Patterns of lagged weather effects reflect biological mechanisms. Malar J. 2004;3: 41. doi: 10.1186/1475-2875-3-41 15541174

5. Bi P, Tong S, Donald K, Parton KA, Ni J. Climatic variables and transmission of malaria: a 12-year data analysis in Shuchen County China. Public Health Rep. 2003;118: 65. doi: 10.1093/phr/118.1.65 12604766

6. Zhou G, Minakawa N, Githeko A, Yan G. Association between climate variability and malaria epidemics in the East African highlands. Proc Natl Acad Sci USA. 2004; 101:2375–80 doi: 10.1073/pnas.0308714100 14983017

7. Wangdi K, Singhasivanon P, Silawan T, Lawpoolsri S, White N J, Kaewkungwal J. Development of temporal modelling for forecasting and prediction of malaria infections using time-series and ARIMAX analyses: a case study in endemic districts of Bhutan. Malar J. 2010; 9:251. doi: 10.1186/1475-2875-9-251 20813066

8. Ulrich Helfenstein. The use of transfer function models, intervention analysis and related time series methods in epidemiology. Int J Epidemiol. 1991; 20:808–15. doi: 10.1093/ije/20.3.808 1955267

9. Nobre F, Monteiro A, Telles P, Williamson G. Dynamic linear model and SARIMA: a comparison of their forecasting performance in epidemiology. Statist Med. 2001; 20:3051–69.

10. Ture M, Kurt I. Comparison of four different time series methods to forecast hepatitis A virus infection. Expert Syst Appl. 2006; 31:41–6.

11. Zhang G P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003; 50(0):159–175.

12. Smith J, Tahani L, Bobogare A, Bugoro H, Otto F, Fafale G, et al. Malaria early warning tool: linking inter-annual climate and malaria variability in northern Guadalcanal, Solomon Islands. Malaria Journal 2017, 16(1):472. doi: 10.1186/s12936-017-2120-5 29162098

13. Zhai JX, Lu Q, Hu WB, Tong SL, Wang B, Yang FT, et al. Development of an empirical model to predict malaria outbreaks based on monthly case reports and climate variables in Hefei, China, 1990-2011.ACTA TROPICA 2018;148–154. doi: 10.1016/j.actatropica.2017.11.001 29138004

14. Ebhuoma O, Gebreslasie M, Magubane L. A Seasonal Autoregressive Integrated Moving Average (SARIMA) forecasting model to predict monthly malaria cases in KwaZulu-Natal, South Africa. South African Medical Journal = Suid-Afrikaanse Tydskrif Vir Geneeskunde 2018;108(7):573–578. doi: 10.7196/SAMJ.2018.v108i7.12885 30004345

15. Shi H, Xu M, Li R. Deep Learning for Household Load Forecasting–A Novel Pooling Deep RNN. IEEE Transactions on Smart Grid 2017;1.

16. Janowczyk A, Madabhushi A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics 2016; 7(1):29.

17. Lee KY, Chung N, Hwang S. Application of an artificial neural network (ANN) model for predicting mosquito abundances in urban areas. Ecological Informatics 2016;172–180.

18. Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM. Neural Computation 2000; 12(10):2451–2471. doi: 10.1162/089976600300015015 11032042

19. Chae Sangwon, Kwon Sungjun, Lee Donghyun. Predicting Infectious Disease Using Deep Learning and Big Data. International Journal of Environmental Research and Public Health 2018; 5(8):1596.

20. Bhatt S, Cameron E, Flaxman S R, Weiss D J, Smith D L, Gething P W. Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization. Journal of the Royal Society Interface 2017; 14(134).

21. National Health and Family Planning Commission of the People’s Republic of China. Action plan of China malaria elimination (2010–2020). Available from: http://www.nhfpc.gov.cn/jkj/s5873/201005/f84f1c4b0f32420990d23b65a88e2d87.shtml [accessed Nov 22 2018]. Chinese.

22. Robert R, Matthew G, Peter A, David S, Peter G. Seasonality of Plasmodium falciparum transmission: a systematic review. Malaria Journal 2015;14(1):343.

23. Zhai J X, Lu Q, Hu W B, Tong S L, Wang B, Yang FT, et al. Development of an empirical model to predict malaria outbreaks based on monthly case reports and climate variables in Hefei, China, 1990–2011. Acta Tropica 2018; 178:148–154. doi: 10.1016/j.actatropica.2017.11.001 29138004

24. Box G E P, Jenkins G M. Time Series Analysis: Forecasting and Control[J]. Journal of Time 2010;31(4):303–303.

25. Kabacoff R. R in Action. Manning Publications Co. 2011.

26. Permanasari A E, Rambli D R A, Dominic P D D. Performance of Univariate Forecasting on Seasonal Diseases: The Case of Tuberculosis. Advances in Experimental Medicine & Biology 2011; 696:171.

27. Wang J, Wang J. Forecasting stochastic neural network based on financial empirical mode decomposition. Elsevier Science Ltd 2017.

28. Zhang G, Patuwo B E, Hu M Y. Forecasting With Artificial Neural Networks: The State of the Art[J]. International Journal of Forecasting 1998; 14(1):35–62.

29. Ma X, Tao Z, Wang Y, Yu H, Wang Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transportation Research Part C 2015; 54:187–197.

30. David H. Wolpert. Stacked Generalization. Neural Networks 2011; 5(2):241–259.

31. Dietterich T. G. Machine Learning Research: Four Current Directions Thomas G. Ai Magazine; 2000,18(4):97–136.

32. Bohanec M, Cestnik B. A schema for using multiple knowledge[C]// The Workshop on Computational Learning Theory & Natural Learning Systems: Intersections Between Theory & Experiment: Intersections Between Theory & Experiment. MIT Press 1994;157–170.

33. Brown G. Ensemble Learning. Encyclopedia of Machine Learning 2011; 4(4):125–142.

34. Wang Ruobing. Significantly Improving the Prediction of Molecular Atomization Energies by an Ensemble of Machine Learning Algorithms and Rescanning Input Space: A Stacked Generalization Approach. The Journal of Physical Chemistry C 2018;122(16)8868–8873.

35. Zhou J, Lu Q, Xu R, Gui L, Wang H. EL_LSTM: Prediction of DNA-binding residue from Protein sequence by Combining Long Short-Term Memory and Ensemble Learning.IEEE/ACM Transactions On Computational Biology And Bioinformatics 2018.

36. Todorovski L, Džeroski S. Combining Classifiers with Meta Decision Trees. Machine Learning 2003; 50(3):223–249.

37. Hyndmana Rob J, Koehlerb Anne B. Another look at measures of forecast accuracy. International Journal of Forecasting 2006;22(4): 679–688.

38. Chenar S S, Deng Z. Development of artificial intelligence approach to forecasting oyster norovirus outbreaks along Gulf of Mexico coast. Environment International 2018; 111:212–223. doi: 10.1016/j.envint.2017.11.032 29232561

39. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation 1997; 9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735 9377276

40. Breiman Leo. Bagging predictors. Machine Learning 1996; 24(2):123–140.

41. Bernstein FC, Koetzle TF, Williams GJB, et al. The Protein Data Bank. European Journal of Biochemistry 1977;80 (2):319–324.1977 doi: 10.1111/j.1432-1033.1977.tb11885.x 923582

42. Xiong Y, Liu J, Wei D Q. An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins-structure Function & Bioinformatics 2011; 79(2):509–517.

43. Rost B, Sander C. Conservation and prediction of solvent accessibility in protein families. Proteins Structure Function & Bioinformatics 1994; 20(3):216–226.

44. Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J. A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 2009; 31(5):855–868.

45. Sak H, Senior A, Beaufays F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Computer Science 2014; 338–342.

46. Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks. 2014; 4:3104–3112.

47. Zaremba W, Sutskever I, Vinyals O. Recurrent Neural Network Regularization. Eprint Arxiv 2014.

48. Bowman SR, Angeli G, Potts C. CD Manning. A large annotated corpus for learning natural language inference. Computer Science 2015.

49. Murphree D, Ngufor C, Upadhyaya S, Madde N, Clifford L, Kor D J, et al. Ensemble learning approaches to predicting complications of blood transfusion. Conf Proc IEEE Eng Med Biol Soc 2015;7222–7225. doi: 10.1109/EMBC.2015.7320058 26737958

50. Pernía-Espinoza A., Fernandez-Ceniceros J., Antonanzas J., Urraca R., Martinez-de-Pison F.J. Stacking ensemble with parsimonious base models to improve generalization capability in the characterization of steel bolted components. Applied Soft Computing 2018;737–750.


Článek vyšel v časopise

PLOS One


2019 Číslo 12