Biomarker discovery in inflammatory bowel diseases using network-based feature selection

Autoři: Mostafa Abbas aff001;  John Matta aff002;  Thanh Le aff003;  Halima Bensmail aff001;  Tayo Obafemi-Ajayi aff003;  Vasant Honavar aff004;  Yasser EL-Manzalawy aff004
Působiště autorů: Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar aff001;  Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America aff002;  Engineering Program, Missouri State University, Springfield, MO, United States of America aff003;  College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, United States of America aff004;  Geisinger Health System, Danville, PA, United States of America aff005
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0225382


Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.

Klíčová slova:

Biomarkers – Biopsy – Centrality – Inflammatory bowel disease – Metagenomics – Microbial ecology – Network analysis – Network resilience


Článek vyšel v časopise


2019 Číslo 11