An exploration of automated narrative analysis via machine learning

Autoři: Sharad Jones aff001;  Carly Fox aff002;  Sandra Gillam aff003;  Ronald B. Gillam aff003
Působiště autorů: Department of Mathematics and Statistics, Utah State University, Logan, Utah, United States of America aff001;  Department of Special Education and Rehabilitation, Utah State University, Logan, Utah, United States of America aff002;  Department of Communication Disorders and Deaf Education, Utah State University, Logan, Utah, United States of America aff003
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article


The accuracy of four machine learning methods in predicting narrative macrostructure scores was compared to scores obtained by human raters utilizing a criterion-referenced progress monitoring rubric. The machine learning methods that were explored covered methods that utilized hand-engineered features, as well as those that learn directly from the raw text. The predictive models were trained on a corpus of 414 narratives from a normative sample of school-aged children (5;0-9;11) who were given a standardized measure of narrative proficiency. Performance was measured using Quadratic Weighted Kappa, a metric of inter-rater reliability. The results indicated that one model, BERT, not only achieved significantly higher scoring accuracy than the other methods, but was consistent with scores obtained by human raters using a valid and reliable rubric. The findings from this study suggest that a machine learning method, specifically, BERT, shows promise as a way to automate the scoring of narrative macrostructure for potential use in clinical practice.

Klíčová slova:

Language – Machine learning – Machine learning algorithms – Microstructure – Neural networks – Semantics – Undergraduates


1. Dikli S. An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment. 2006 Aug 16;5(1).

2. Page EB. The imminence of… grading essays by computer. The Phi Delta Kappan. 1966 Jan 1;47(5):238–43.

3. Ramineni C, Williamson DM. Automated essay scoring: Psychometric guidelines and practices. Assessing Writing. 2013 Jan 1;18(1):25–39. doi: 10.1016/j.asw.2012.10.004

4. Burstein J, Kukich K, Wolff S, Lu C, Chodorow M. Computer analysis of essays. InNCME Symposium on Automated Scoring 1998 Apr 1.

5. Rizavi S, Sireci SG. Comparing computerized and human scoring of WritePlacer essays (Laboratory of Psychometric and Evaluative Research Rep. No. 354). Amherst: School of Education, University of Massachusetts. 1999.

6. Shermis MD, Burstein JC. Automated essay scoring: A cross-disciplinary perspective. Routledge; 2003 Jan 30.

7. Hussein MA, Hassan HA, Nassef M. Automated language essay scoring systems: A literature review. PeerJ Preprints. 2019 May 9;7:e27715v1.

8. Zupanc K, Bosnić Z. Automated essay evaluation with semantic analysis. Knowledge-Based Systems. 2017 Mar 15;120:118–32. doi: 10.1016/j.knosys.2017.01.006

9. Gillam SL, Gillam RB, Fargo JD, Olszewski A, Segura H. Monitoring indicators of scholarly language: A progress-monitoring instrument for measuring narrative discourse skills. Communication Disorders Quarterly. 2017 Feb;38(2):96–106. doi: 10.1177/1525740116651442

10. Pavelko SL, Owens RE Jr, Ireland M, Hahs-Vaughn DL. Use of language sample analysis by school-based SLPs: Results of a nationwide survey. Language, speech, and hearing services in schools. 2016 Jul;47(3):246–58. doi: 10.1044/2016_LSHSS-15-0044 27380004

11. Westerveld MF, Claessen M. Clinician survey of language sampling practices in Australia. International Journal of Speech-Language Pathology. 2014 Jun 1;16(3):242–9. doi: 10.3109/17549507.2013.871336 24447161

12. Hughes DL, McGillivray L, Schmidek M. Guide to narrative language: Procedures for assessment. Eau Claire, WI: Thinking Publications; 1997 Jan.

13. Milosky LM. Narratives in the classroom. In Seminars in Speech and Language 1987 (Vol. 8, No. 4, pp. 329–343).

14. Gutierrez-Clellen VF, Peña E, Quinn R. Accommodating cultural differences in narrative style: A multicultural perspective. Topics in Language Disorders. 1995 Aug.

15. Westerveld MF, Gillon GT, Miller JF. Spoken language samples of New Zealand children in conversation and narration. Advances in Speech Language Pathology. 2004 Jan 1;6(4):195–208. doi: 10.1080/14417040400010140

16. Stein NL, Glenn CG. An Analysis of Story Comprehension in Elementary School Children: A Test of a Schema.

17. Kaderavek JN, Sulzby E. Narrative production by children with and without specific language impairment: Oral narratives and emergent readings. Journal of Speech, Language, and Hearing

18. Merritt DD, Liles BZ. Story grammar ability in children with and without language disorder: Story generation, story retelling, and story comprehension. Journal of Speech, Language, and Hearing Research. 1987 Dec;30(4):539–52. doi: 10.1044/jshr.3004.539

19. Reilly J, Losh M, Bellugi U, Wulfeck B. “Frog, where are you?” Narratives in children with specific language impairment, early focal brain injury, and Williams syndrome. Brain and language. 2004 Feb 1;88(2):229–47. doi: 10.1016/S0093-934X(03)00101-9 14965544

20. Gillam RB, Pearson NA. TNL: test of narrative language. Austin, TX: Pro-ed; 2004.

21. Miller JF, Andriacchi K, Nockerts A. Assessing language production using SALT software: A clinician’s guide to language sample analysis. Middleton, WI: SALT Software, LLC; 2011.

22. MacWhinney B. (2000). The CHILDES Project: Tools for analyzing talk. Third Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

23. McNamara DS, Graesser AC, McCarthy PM, Cai Z. Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press; 2014 Mar 24.

24. Graesser AC, McNamara DS, Louwerse MM, Cai Z. Coh-Metrix: Analysis of text on cohesion and language. Behavior research methods, instruments, computers. 2004 May 1;36(2):193–202. doi: 10.3758/BF03195564 15354684

25. Somasundaran S, Flor M, Chodorow M, Molloy H, Gyawali B, McCulla L. Towards Evaluating Narrative Quality In Student Writing. Transactions of the Association of Computational Linguistics. 2018 Jul;6:91–106. doi: 10.1162/tacl_a_00007

26. Breiman L. Random forests. Machine learning. 2001 Oct 1;45(1):5–32. doi: 10.1023/A:1010933404324

27. Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. InProceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) 2014 (pp. 1532-1543).

28. Hochreiter S, Schmidhuber J. LSTM can solve hard long time lag problems. In Advances in neural information processing systems 1997 (pp. 473-479).

29. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Kriken M, Cao Y, Gao Q, Macherey K. Google’s neural machine translation system: Bridging the gap between human and machine translation arXiv preprint arXiv:1609.08144. 2016.

30. Venugopalan S, Xu H, Donahue J, Rohrbach M, Mooney R, Saenko, K. Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729. 2014.

31. Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639. 2016.

32. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018 Oct 11.

33. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. Advances in neural information processing systems. 2017 5998–6008.

34. Chen H, He B. Automated essay scoring by maximizing human-machine agreement. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing 2013 (pp. 1741-1752).

35. Shermis MD. State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration. Assessing Writing. 2014 Apr 1;20:53–76. doi: 10.1016/j.asw.2013.04.001

36. Sun S, Cheng Y, Gan Z, Liu J. Patient Knowledge Distillation for BERT Model Compression. arXiv preprint arXiv:1908.09355. 2019.

37. Tang R, Lu Y, Liu L, Mou L, Vechtomova O, Lin J. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. arXiv preprint arXiv:1903.12136. 2019.

38. Leckie G, Baird JA. Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement. 2011 Dec;48(4):399–418. doi: 10.1111/j.1745-3984.2011.00152.x

Článek vyšel v časopise


2019 Číslo 10
Nejčtenější tento týden