Quantification of speech and synchrony in the conversation of adults with autism spectrum disorder

Autoři: Keiko Ochi aff001;  Nobutaka Ono aff002;  Keiho Owada aff003;  Masaki Kojima aff003;  Miho Kuroda aff003;  Shigeki Sagayama aff004;  Hidenori Yamasue aff005
Působiště autorů: School of Media Science, Tokyo University of Technology, Hachioji, Japan aff001;  Department of Computer Science, Graduate School of Systems Design, Tokyo Metropolitan University, Hino, Japan aff002;  Department of Child Psychiatry, School of Medicine, The University of Tokyo, Tokyo, Japan aff003;  University of Tokyo, Tokyo, Japan aff004;  Department of Psychiatry, Hamamatsu University School of Medicine, Hamamatsu, Japan aff005
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pone.0225377


Autism spectrum disorder (ASD) is a highly prevalent neurodevelopmental disorder characterized by impairments in social reciprocity and communication together with restricted interest and stereotyped behaviors. The Autism Diagnostic Observation Schedule (ADOS) is considered a ‘gold standard’ instrument for diagnosis of ASD and mainly depends on subjective assessments made by trained clinicians. To develop a quantitative and objective surrogate marker for ASD symptoms, we investigated speech features including F0, speech rate, speaking time, and turn-taking gaps, extracted from footage recorded during a semi-structured socially interactive situation from ADOS. We calculated not only the statistic values in a whole session of the ADOS activity but also conducted a block analysis, computing the statistical values of the prosodic features in each 8s sliding window. The block analysis identified whether participants changed volume or pitch according to the flow of the conversation. We also measured the synchrony between the participant and the ADOS administrator. Participants with high-functioning ASD showed significantly longer turn-taking gaps and a greater proportion of pause time, less variability and less synchronous changes in blockwise mean of intensity compared with those with typical development (TD) (p<0.05 corrected). In addition, the ASD group had significantly wider distribution than the TD group in the within-participant variability of blockwise mean of log F0 (p<0.05 corrected). The clinical diagnosis could be discriminated using the speech features with 89% accuracy. The features of turn-taking and pausing were significantly correlated with deficits of ASD in reciprocity (p<0.05 corrected). Additionally, regression analysis provided 1.35 of mean absolute error in the prediction of deficits in reciprocity, to which the synchrony of intensity especially contributed. The findings suggest that considering variance of speech features, interaction and synchrony with conversation partner are critical to characterize atypical features in the conversation of people with ASD.

Klíčová slova:

Autism – Autism spectrum disorder – Diagnostic medicine – Emotions – Regression analysis – Social communication – Speech – Verbal communication


1. Baio J, Prevalence of Autism Spectrum Disorders: Autism and Developmental Disabilities Monitoring Network, 14 Sites, United States, 2008, MMWR Surveill Summ. 2012;61(3):1–19. 22456193

2. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM-5 5th edn, American Psychiatric Publishing: Washington, D.C. 2013.

3. Owada K., Kojima M., Yassin W., Kuroda M., Kawakubo Y., Kuwabara H., et al. Computer-analyzed facial expression as a surrogate marker for autism spectrum social core symptoms. PLoS One, 2008;13(1):e0190442.

4. Owada K, Okada T, Munesue T, Kuroda M, Fujioka T, Uno Y, Matsumoto K, Kuwabara H, Mori D, Okamoto Y, Yoshimura Y, Kawakubo Y, Arioka Y, Kojima M, Yuhi T, Yassin W, Kushima I, Benner S, Ogawa N, Kawano N, Eriguchi Y, Uemura Y, Yamamoto M, Kano Y, Kasai K, Higashida H, Ozaki N, Kosaka H, Yamasue H. Quantitative facial expression analysis revealed the efficacy and time course of oxytocin in autism. Brain. 2019; May 16.

5. Lord C, Rutter M, Goode S, Heemsberge J, Mawhood L, et al. Autism Diagnostic Observation Schedule: a Standardized Observation of Communicative and Social Behavior. Journal of Autism and Developmental Disorders. 1989;19(2):185–212. doi: 10.1007/bf02211841 2745388

6. McCann J, Peppé S, Prosody in Autism Spectrum Disorders: a Critical Review, International Journal of Language and Communication Disorders. 2003;38(4):325–350. doi: 10.1080/1368282031000154204 14578051

7. Sharda M, Subhadra TP, Sahay S, Nagaraja C, Singh L, Mishra R Singh NC. Sounds of Melody—Pitch Patterns of Speech in Autism. Neuroscience letters. 2010; 478(1):42–45. doi: 10.1016/j.neulet.2010.04.066 20447444

8. Bonneh SY, Levanon Y, Dean-Pardo O, Lossos L, Adini Y, Abnormal Speech Spectrum and Increased Pitch Variability in Young Autistic Children. Frontiers in Human Neuroscience. 2011;4:237. doi: 10.3389/fnhum.2010.00237 21267429

9. Nadig A, Shaw H, Acoustic and Perceptual Measurement of Expressive Prosody in High-functioning Autism: Increased Pitch Range and What It Means to Listeners. Journal of Autism and Developmental Disorders. 2012;42(4): 499–511. doi: 10.1007/s10803-011-1264-3 21528425

10. Filipe MG, Frota S, Castro SL, Vicente SG. Atypical Prosody in Asperger Syndrome: Perceptual and Acoustic Measurements. Journal of Autism and Developmental Disorders. 2014;44(8):1972–1981. doi: 10.1007/s10803-014-2073-2 24590408

11. Kaland C, Krahmer E, Swerts M. Contrastive Intonation in Autism: The Effect of Speaker-and Listener-perspective. Thirteenth Annual Conference of the International Speech Communication Association. 2012.

12. Nakai Y, Takashima R, Takiguchi T, Takada S, Speech Intonation in Children with Autism Spectrum Disorder. Brain and Development. 2014;36(6):516–522. doi: 10.1016/j.braindev.2013.07.006 23973369

13. Tanaka H, Sakti SH, Neubig G, Toda T, Nakamura S. Linguistic and Acoustic Features for Automatic Identification of Autism Spectrum Disorders in Children’s Narrative, Proceedings of the ACL2014 Workshop on Computational Linguistics and Clinical Psychology. 2014;88–96.

14. Scharfstein LA, Beidel DC, Sims VK, Finnell LR. Social Skills Deficits and Vocal Characteristics of Children with Social Phobia or Asperger’s Disorder: A Comparative Study. Journal of Abnormal Child Psychology. 2011;39(6):865–875. doi: 10.1007/s10802-011-9498-2 21399935

15. Grossman RB, Bemis RH, Skwerer DP, Tager-Flusberg H. Lexical and Affective Prosody in Children with High-functioning Autism. Journal of Speech, Language, and Hearing Research. 2010;53(3):778–793. doi: 10.1044/1092-4388(2009/08-0127) 20530388

16. Fusaroli R, Lambrechts A., Bang D, Bowler DM, Gaigg SB. Is Voice a Marker for Autism Spectrum Disorder? A Systematic Review and Meta-analysis. Autism Research. 2017;10(3):384–407. doi: 10.1002/aur.1678 27501063

17. Green H, Tobin Y, Prosodic Analysis is Difficult … But Worth It: A Study in High Functioning Autism, Journal of Speech-Language Pathology. 2009;11(4):308–315.

18. Heeman P, Lunsford R, Selfridge E, Black L, van Santen L, Autism and Interactional Aspects of Dialogue, Proceedings of 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2010;249–252.

19. Levitan R, Beňuš Š, Gravano A, Hirschberg J. Acoustic-Prosodic Entrainment in Slovak, Spanish, English and Chinese: A Cross-linguistic Comparison. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2015:325–334.

20. Kousidis S, Dorran D, McDonnell C, Coyle E. Times Series Analysis of Acoustic Feature Convergence in Human Dialogues. Proceedings of Interspeech. 2008.

21. Pérez JM, Gálvez RH, and Gravano A. Disentrainment May Be a Positive Thing: A Novel Measure of Unsigned Acoustic-Prosodic Synchrony, and Its Relation to Speaker Engagement. Proceedings of Interspeech 2016. 2016;1270–1274.

22. Gupta R, Bone D, Lee S, Narayanan S. Analysis of Engagement Behavior in Children During Dyadic Interactions Using Prosodic Cues. Computer Speech amd Language. 2016;37:47–66.

23. Kakihara Y, Takiguchi T, Ariki Y, Nakai Y Takada S. Investigation of Classification Using Pitch Features for Children with Autism Spectrum Disorders and Typically Developing Children. American Journal of Signal Processing. 2015;5(1):1–5.

24. Ringeval F, Marchi E, Grossard C, Xavier J, Chetouani M, Cohen D, Schuller B. Automatic Analysis of Typicaland Atypical Encoding of Spontaneous Emotion in the Voice of Children. Proceedings of Interspeech 2016. 2016;1210–1204.

25. Bone D, Lee CC, Black MP, Williams ME, Lee S, Levitt P, Narayanan S. The Psychologist as an Interlocutor in Autism Spectrum Disorder Assessment: Insights from a Study of Spontaneous Prosody. Journal of Speech, Language, and Hearing Research. 2014;57(4):1162–1177. doi: 10.1044/2014_JSLHR-S-13-0062 24686340

26. Bone D, Bishop S, Gupta R, Lee S, Narayanan S. Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental Disorders, Proceedings of Interspeech 2016;1185–1189.

27. American Psychiatric Association, American Psychiatric Association. Diagnostic and statistical manual of mental disorders: DSM-IV-TR, 4th edn, American Psychiatric Association: Washington, DC, 2000.

28. Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism and Developmental Disorders 1994;24:659–685. doi: 10.1007/bf02172145 7814313

29. Wechsler D, Manual for the Wechsler Adult Intelligence Scale—Revised. Psychological Corp.: New York, 1981.

30. Hollingshead AB. Two factor index of social position. New Haven, Conn.: Yale University, Dept. of Sociology; 1957.

31. Matsuoka K, Uno M, Kasai K, Koyama K, Kim Y. Estimation of premorbid IQ in individuals with Alzheimer’s disease using Japanese ideographic script (Kanji) compound words: Japanese version of National Adult Reading Test. Psychiatry Clin Neurosci. 2006;60(3):332–9. doi: 10.1111/j.1440-1819.2006.01510.x 16732750

32. Nelson HE. The National Adult Reading Test (NART): Test Manual. Windsor, UK: NFER-Nelson; 1982.

33. First MB, Spitzer RL, Gibbon M, Williams JBM. Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Non-patient Edition. (SCID-I/NP). Biometrics Research, New York State Psychiatric Institute: New York, 2002.

34. Boersma P. Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-noise Ratio of a Sampled Sound. Proceedings of the Institute of Phonetic Sciences. 1993;17(1193):97–110.

35. De Looze C Hirst DJ. Detecting Changes in Key and Range for the Automatic Modelling and Coding of Intonation. Proceeding of Speech Prosody, 2008.

36. Lee A. Kawahara T. Recent Development of Open-Source Speech Recognition Engine Julius. Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 2009;131–137.

37. Levitan R. Hirschberg J. Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions. Proceedings of Interspeech 2011, 2011;3081–3084.

38. Heldner M, Edlund J. Pauses, Gaps and Overlaps in Conversations. Journal of Phonetics. 2010;38(4):555–568.

39. Sato R, Higashinaka R, Tamoto M, Nakano M, Aikawa K. Learning Decision Trees to Determine Turn-Taking by Spoken Dialogue Systems. Proceedings of 7th International Conference on Spoken Language Processing. 2002;861–864.

40. Weilhammer K, Rabold S. Durational Aspecats in Turn Taking. Proceedings of the International Conference of Phonetic Sciences. 2003.

41. Ten Bosch L, Oostdijk N, De Ruiter JP. Durational Aspects of Turn-Taking in Spontaneous Face-to-Face and Telephone Dialogues. Proceedings of International Conference on Text, Speech and Dialogue. 2004; 563–570.

42. Benjamini Y. Hochberg Y. Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological), 1995;57(1):289–300.

43. Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A, e1071: Misc Functions of the Department of Statistics (e1071). TU Wien. R package version 1. 2011;6–7.

44. Lunsford R, Heeman P, Black LM, van Santen JP. Autism and the Use of Fillers: Differences Between ‘Um’ and ‘Uh’. DiSS-LPSS Joint Workshop, 2010;107–110.

45. Tree JEF. Listeners’ Uses Ofum Anduh in Speech Comprehension. Memory and Cognition, 2001;29(2)320–326. doi: 10.3758/bf03194926 11352215

46. Clark HH, Tree JEF. Using Uh and Um in Spontaneous Speaking. Cognition, 2002;84(1): 73–111. doi: 10.1016/s0010-0277(02)00017-3 12062148

47. Morett LM, O’Hearn K, Luna B, Ghuman AS. Altered Gesture and Speech Production in ASD Detract from In-person Communicative Quality. Journal of Autism and Developmental Disorders. 2016;46(3):998–1012. doi: 10.1007/s10803-015-2645-9 26520147

48. Thurber C, Tager-Flusberg H. Pauses in the Narratives Produced by Autistic, Mentally Retarded, and Normal Children as an Index of Cognitive Demand. Journal of Autism and Developmental Disorders. 1993;23(2):309–322. doi: 10.1007/bf01046222 8331049

49. Stivers T, Enfield NJ, Brown P, Englert C, Hayashi M, Heinemann T, Hoyamann G, Rossano F, de Ruiter JP, Yoon KE, Levinson SC. Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences, 2009;106(26):10587–10592.

50. Bögels S, Torreira F, Listeners Use Intonational Phrase Boundaries to Project Turn Ends in Spoken Interaction. Journal of Phonetics, 2015;52:46–57.

Článek vyšel v časopise


2019 Číslo 12
Nejčtenější tento týden