Dimensionality reduction methods for biomedical data
Jan Kalina 1; Anna Schlenker 2,3
Institute of Computer Science CAS, Prague, Czech Republic
1; First Faculty of Medicine, Charles University, Prague, Czech Republic
2; Faculty of Biomedical Engineering, Czech Technical University in Prague, Kladno, Czech Republic
Vyšlo v časopise:
Lékař a technika - Clinician and Technology No. 1, 2018, 48, 29-35
The aim of this paper is to present basic principles of common multivariate statistical approaches to dimensionality reduction and to discuss three particular approaches, namely feature extraction, (prior) variable selection, and sparse variable selection. Their important examples are also presented in the paper, which includes the principal component analysis, minimum redundancy maximum relevance variable selection, and nearest shrunken centroid classifier with an intrinsic variable selection. Each of the three methods is illustrated on a real dataset with a biomedical motivation, including a biometric identification based on keystroke dynamics or a study of metabolomic profiles. Advantages and benefits of performing dimensionality reduction of multivariate data are discussed.
biomedical data, dimensionality, biostatistics, multivariate analysis, sparsity
- Rencher, A. C.: Methods of multivariate analysis. Second edn. Wiley, New York, 2002.
- Dziuda, D. M.: Data mining for genomics and proteomics: Analysis of gene and protein expression data. Wiley, New York, 2010.
- Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 2010, vol. 20, pp. 101–148.
- He, M., Petoukhov, S.: Mathematics of bioinformatics: Theory, methods and applications. Wiley, Hoboken, 2011.
- Luo, J., Wu, M., Gopukumar, D., Zhao, Y.: Big data application in biomedical research and health care: A literature review. Biomedical Informatics Insights, 2016, vol. 8, pp. 1–10.
- Zvárová, J., Veselý, A., Vajda, I.: Data, information and knowledge. In Berka, P., Rauch, J., Zighed, D. (eds.): Data mining and medical knowledge management: Cases and applications standards. IGI Global, Hershey, 2009, pp. 1–36.
- Venot, A., Burgun, A., Quantin, C. (eds.): Medical informatics, e-health. Fundamentals and applications. Springer, Paris, 2014.
- Tan, Y., Shi, L., Tong, W., Hwang, G. T. G., Wang, C.: Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models. Computational Biology and Chemistry, 2004, vol. 28, pp. 235–244.
- Duintjer Tebbens, J., Schlesinger, P.: Improving implementation of linear discriminant analysis for the high-dimensional/small sample size problem. Computational Statistics and Data Analysis, 2007, vol. 52, pp. 423–437.
- Bartenhagen, C., Klein, H. U., Ruckert, C., Jiang, X., Dugas, M.: Comparative study of unsupervised dimension reduction tech-niques for the visualization of microarray gene expression data. BMC Bioinformatics, 2010, vol. 11, Article 567.
- Matloff, N.: Statistical regression and classification. From linear models to machine learning. CRC Press, Boca Raton, 2017.
- Haufe, S., Dähne, S, Nikulin, V. V.: Dimensionality reduction for the analysis of brain oscillations. NeuroImage, 2014, vol. 101, pp. 583–597.
- Lee, J., Ciccarello, S., Acharjee, M., Das, K.: Dimension reduc-tion of gene expression data. Journal of Statistical Theory and Practice, 2018, online first, in press.
- Tan, C. W., Kumar, A.: Unified framework for automated iris segmentation using distantly acquired face images. IEEE Transactions on Image Processing, 2012, vol. 21, pp. 4068–4079.
- Hastie, T., Tibshirani, R., Wainwright, M.: Statistical learning with sparsity: The lasso and generalizations. CRC Press, Boca Raton, 2015.
- Bushel, P., Wolfinger, R. D., Gibson. G.: Simultaneous clus-tering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Systems Biology, 2007, vol. 1, Article 15.
- McFerrin, L.: Package HDMD. R package version 1.2 (2013).
- Schlenker, A., Tichý, T.: A new approach to the evaluation of local muscular load while typing on a keyboard. Central European Journal of Public Health 2017, vol. 25, pp. 255–260.
- Vapnik, V. N.: The nature of statistical learning theory. Second edn. Springer, New York, 2000.
- Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 2005, vol. 3, pp. 185–205.
- Kalina, J., Schlenker, A.: A robust supervised variable selection for noisy high-dimensional data. BioMed Research International, 2015, vol. 2015, Article 320385.
- Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class predic-tion by nearest shrunken centroids, with applications to DNA microarrays. Statistical Science, 2003, vol. 18, pp. 104–117.
- Sreekumar, A. et al.: Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature, 2009, vol. 457, pp. 910–914.
- Xu, Y., Li, C., Li, X.: Package MPINet. R package version 1.0 (2015).
- Viswanath, S. E., Tiwari, P., Lee, G., Madabhushi, A., Alz-heimer’s Disease Neuroimaging Initiative: Dimensionality reduction-based fusion approaches for imaging and non-imaging biomedical data: Concepts, worksflow, and use-cases. BMC Medical Imaging, 2017, vol. 17, Article 2.
- Harikumar, R., Kumar, P. S.: Dimensionality reduction tech-niques for processing epileptic encephalographic signals. Biomedical & Pharmacology Journal, 2015, vol. 8, pp. 103–106.
- Xie, H., Li, J., Zhang, Q., Wang, Y.: Comparison among dimensionality reduction techniques based on Random Projection for cancer classification. Computational Biology and Chemistry, 2016, vol. 65, 165–172.