Signatures of medical student applicants and academic success

Autoři: Tal Baron aff001;  Robert I. Grossman aff003;  Steven B. Abramson aff004;  Martin V. Pusic aff002;  Rafael Rivera aff003;  Marc M. Triola aff002;  Itai Yanai aff001
Působiště autorů: Institute for Computational Medicine, New York University Grossman School of Medicine, New York, New York, United States of America aff001;  Institute for Innovations in Medical Education, New York University Grossman School of Medicine, New York, New York, United States of America aff002;  Department of Radiology, New York University Grossman School of Medicine, New York, New York, United States of America aff003;  Department of Medicine, New York University Grossman School of Medicine, New York, New York, United States of America aff004
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: 10.1371/journal.pone.0227108


The acceptance of students to a medical school places a considerable emphasis on performance in standardized tests and undergraduate grade point average (uGPA). Traditionally, applicants may be judged as a homogeneous population according to simple quantitative thresholds that implicitly assume a linear relationship between scores and academic success. This ‘one-size-fits-all’ approach ignores the notion that individuals may show distinct patterns of achievement and follow diverse paths to success. In this study, we examined a dataset composed of 53 variables extracted from the admissions application records of 1,088 students matriculating to NYU School of Medicine between the years 2006–2014. We defined training and test groups and applied K-means clustering to search for distinct groups of applicants. Building an optimized logistic regression model, we then tested the predictive value of this clustering for estimating the success of applicants in medical school, aggregating eight performance measures during the subsequent medical school training as a success factor. We found evidence for four distinct clusters of students—we termed ‘signatures’—which differ most substantially according to the absolute level of the applicant’s uGPA and its trajectory over the course of undergraduate education. The ‘risers’ signature showed a relatively higher uGPA and also steeper trajectory; the other signatures showed each remaining combination of these two main factors: ‘improvers’ relatively lower uGPA, steeper trajectory; ‘solids’ higher uGPA, flatter trajectory; ‘statics’ both lower uGPA and flatter trajectory. Examining the success index across signatures, we found that the risers and the statics have significantly higher and lower likelihood of quantifiable success in medical school, respectively. We also found that each signature has a unique set of features that correlate with its success in medical school. The big data approach presented here can more sensitively uncover success potential since it takes into account the inherent heterogeneity within the student population.

Klíčová slova:

Engineering and technology – Engineers – k means clustering – Machine learning – Medical education – Schools – Standardized tests – Undergraduates


1. Patterson F, Knight A, Dowell J, Nicholson S, Cousans F, Cleland J. How effective are selection methods in medical education? A systematic review. Med Educ. 2016;50: 36–60. doi: 10.1111/medu.12817 26695465

2. Veloski JJ, Callahan CA, Xu G, Hojat M, Nash DB. Prediction of students’ performances on licensing examinations using age, race, sex, undergraduate GPAs, and MCAT scores. Acad Med. 2000;75: S28–30. Available:

3. Saguil A, Dong T, Gingerich RJ, Swygert K, LaRochelle JS, Artino AR, et al. Does the MCAT predict medical school and PGY-1 performance? Mil Med. 2015;180: 4–11. doi: 10.7205/MILMED-D-14-00550 25850120

4. Gauer JL, Wolff JM, Jackson JB. Do MCAT scores predict USMLE scores? An analysis on 5 years of medical student data. Med Educ Online. 2016;21: 31795. doi: 10.3402/meo.v21.31795 27702431

5. Callahan CA, Hojat M, Veloski J, Erdmann JB, Gonnella JS. The predictive validity of three versions of the MCAT in relation to performance in medical school, residency, and licensing examinations: a longitudinal study of 36 classes of Jefferson Medical College. Acad Med. 2010;85: 980–7. doi: 10.1097/ACM.0b013e3181cece3d 20068426

6. Zhao X, Oppler S, Dunleavy D, Kroopnick M. Validity of Four Approaches of Using Repeatersʼ MCAT Scores in Medical School Admissions to Predict USMLE Step 1 Total Scores. Acad Med. 2010;85: S64–S67. doi: 10.1097/ACM.0b013e3181ed38fc 20881707

7. Ogunyemi D, Taylor-Harris DS. Factors that correlate with the U.S. Medical Licensure Examination Step-2 scores in a diverse medical student population. J Natl Med Assoc. 2005;97: 1258–62. Available: 16296216

8. Albanese MA, Snow MH, Skochelak SE, Huggett KN, Farrell PM. Assessing personal qualities in medical school admissions. Acad Med. 2003;78: 313–21. Available: 12634215

9. Griffin B, Bayl-Smith P, Hu W. Predicting patterns of change and stability in student performance across a medical degree. Med Educ. 2018 [cited 30 Jan 2018]. doi: 10.1111/medu.13508 29349791

10. Hall JD, O’Connell AB, Cook JG. Predictors of Student Productivity in Biomedical Graduate School Applications. Heneberg P, editor. PLoS One. 2017;12: e0169121. doi: 10.1371/journal.pone.0169121 28076439

11. Moneta-Koehler L, Brown AM, Petrie KA, Evans BJ, Chalkley R. The Limitations of the GRE in Predicting Success in Biomedical Graduate School. Amaral LAN, editor. PLoS One. 2017;12: e0166742. doi: 10.1371/journal.pone.0166742 28076356

12. Donnon T, Paolucci EO, Violato C. The predictive validity of the MCAT for medical school performance and medical board licensing examinations: a meta-analysis of the published research. Acad Med. 2007;82: 100–6. doi: 10.1097/01.ACM.0000249878.25186.b7 17198300

13. Weiner OD. How should we be selecting our graduate students? Mol Biol Cell. 2014;25: 429–30. doi: 10.1091/mbc.E13-11-0646 24525948

14. West C, Sadoski M. Do study strategies predict academic performance in medical school? Med Educ. 2011;45: 696–703. doi: 10.1111/j.1365-2923.2011.03929.x 21649702

15. Gladwell M. What the dog saw and other adventures. p. 432. Little, Brown and Company; 2009.

16. Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, et al. A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Syst. 2016 [cited 30 Sep 2016]. doi: 10.1016/j.cels.2016.08.011 27667365

17. Yanai I, Chmielnicki E. Computational biologists: moving to the driver’s seat. Genome Biol. 2017;18: 223. doi: 10.1186/s13059-017-1357-1 29169371

18. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin A V, et al. Signatures of mutational processes in human cancer. Nature. 2013;500: 415–21. doi: 10.1038/nature12477 23945592

19. Alon U. An introduction to systems biology: design principles of biological circuits. Chapman & Hall/CRC; 2007.

20. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33: 495–502. doi: 10.1038/nbt.3192 25867923

21. Goodfellow I, Bengio Y, Courville A. Deep learning.

22. Hamid JS, Meaney C, Crowcroft NS, Granerod J, Beyene J, UK Etiology of Encephalitis Study Group. Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis. BMC Infect Dis. 2010;10: 364. doi: 10.1186/1471-2334-10-364 21192831

23. Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction To Algorithms. Cambridge: The MIT Press; 2001.

24. Neyman J. Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability. University of California Press; 1967.

25. Maaten van der LJP, Hinton JE. Visualizing High-Dimensional Data Using t-SNE. J Mach Learn Res. 2008;9: 2579–2605.

26. Duda RO, Hart PE (Peter E, Stork DG. Pattern classification. Wiley; 2001.

27. Chan T, Sebok-Syer S, Thoma B, Wise A, Sherbino J, Pusic M. Learning Analytics in Medical Education Assessment: The Past, the Present, and the Future. Promes S, editor. AEM Educ Train. 2018;2: 178–187. doi: 10.1002/aet2.10087 30051086

28. Ellaway RH, Pusic M V, Galbraith RM, Cameron T. Developing the role of big data and analytics in health professional education. Med Teach. 2014;36: 216–22. doi: 10.3109/0142159X.2014.874553 24491226

29. Triola MM, Pusic M V. The education data warehouse: a transformative tool for health education research. J Grad Med Educ. 2012;4: 113–5. doi: 10.4300/JGME-D-11-00312.1 23451320

30. Baker R, Inventado P. Educational data mining and learning analytics. Learning analytics. New York: Springer; 2014. pp. 61–75.

31. Lee HJ, Park SB, Park SC, Park WS, Ryu S-W, Yang JH, et al. Multiple mini-interviews as a predictor of academic achievements during the first 2 years of medical school. BMC Res Notes. 2016;9: 93. doi: 10.1186/s13104-016-1866-0 26873767

32. Chander A. The Racist Algorithm. Mich Law Rev. 2016;115.

Článek vyšel v časopise


2020 Číslo 1