Finding phrases: On the role of co-verbal facial information in learning word order in infancy

Autoři: Irene de la Cruz-Pavía aff001;  Judit Gervain aff001;  Eric Vatikiotis-Bateson aff004;  Janet F. Werker aff003
Působiště autorů: Integrative Neuroscience and Cognition Center (INCC–UMR 8002), Université Paris Descartes (Sorbonne Paris Cité), Paris, France aff001;  Integrative Neuroscience and Cognition Center (INCC–UMR 8002), CNRS, Paris, France aff002;  Department of Psychology, University of British Columbia, Vancouver, British Columbia, Canada aff003;  Department of Linguistics, University of British Columbia, Vancouver, British Columbia, Canada aff004
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0224786


The input contains perceptually available cues, which might allow young infants to discover abstract properties of the target language. Thus, word frequency and prosodic prominence correlate systematically with basic word order in natural languages. Prelexical infants are sensitive to these frequency-based and prosodic cues, and use them to parse new input into phrases that follow the order characteristic of their native languages. Importantly, young infants readily integrate auditory and visual facial information while processing language. Here, we ask whether co-verbal visual information provided by talking faces also helps prelexical infants learn the word order of their native language in addition to word frequency and prosodic prominence. We created two structurally ambiguous artificial languages containing head nods produced by an animated avatar, aligned or misaligned with the frequency-based and prosodic information. During 4 minutes, two groups of 4- and 8-month-old infants were familiarized with the artificial language containing aligned auditory and visual cues, while two further groups were exposed to the misaligned language. Using a modified Headturn Preference Procedure, we tested infants’ preference for test items exhibiting the word order of the native language, French, vs. the opposite word order. At 4 months, infants had no preference, suggesting that 4-month-olds were not able to integrate the three available cues, or had not yet built a representation of word order. By contrast, 8-month-olds showed no preference when auditory and visual cues were aligned and a preference for the native word order when visual cues were misaligned. These results imply that infants at this age start to integrate the co-verbal visual and auditory cues.

Klíčová slova:

Face – Infants – Language – Semantics – Speech – Speech signal processing – Syntax – Vision


1. Brown R. A first language. Cambridge, USA: Harvard University Press; 1973.

2. Christophe A, Guasti MT, Nespor M, Dupoux E, Van Ooyen B. Reflections on Phonological Bootstrapping: Its Role for Lexical and Syntactic Acquisition. Lang Cogn Neurosci. 1997; 12(5–6): 585–612. doi: 10.1080/016909697386637

3. Morgan JL, Demuth K. Signal to syntax: An overview. In: Morgan JL, Demuth K, editors, Signal to syntax: Bootstrapping from speech to grammar in early acquisition. Mahwah, USA: Lawrence Erlbaum Associates; 1996. pp. 1–22.

4. Christophe A, Nespor M, Guasti MT, Van Ooyen B. Prosodic structure and syntactic acquisition: the case of the head-direction parameter. Dev Sci, 2003; 6(2): 211–220. doi: 10.1111/1467-7687.00273

5. Gervain J, Nespor M, Mazuka R, Horie R, Mehler J. Bootstrapping word order in prelexical infants: A Japanese–Italian cross-linguistic study. Cog Psycho, 2008; 57(1): 56–74. doi: 10.1016/j.cogpsych.2007.12.001 18241850

6. Gervain J, Werker JF. Prosody cues word order in 7-month-old bilingual infants. Nat Commun. 2013; 4: 1490. doi: 10.1038/ncomms2430 23411502

7. Nespor M, Shukla M, van de Vijver R, Avesani C, Schraudolf H, Donati C. Different phrasal prominence realization in VO and OV languages. Lingue E Linguaggio. 2008; 7(2): 1–28.

8. Bernard C, Gervain J. Prosodic Cues to Word Order: What Level of Representation? Front Psychol. 2012; 3. doi: 10.3389/fpsyg.2012.00451 23162500

9. Burnham D, Dodd B. Auditory-visual speech integration by prelinguistic infants: Perception of an emergent consonant in the McGurk effect. Dev Psychobiol. 2004; 45(4): 204–220. doi: 10.1002/dev.20032 15549685

10. McGurk H, MacDonald J. Hearing lips and seeing voices. Nature. 1976; 264: 746–748. doi: 10.1038/264746a0 1012311

11. Rosenblum LD, Schmuckler MA, Johnson JA. The McGurk effect in infants. Percept Psychophys. 1997; 59(3): 347–357. doi: 10.3758/bf03211902 9136265

12. de la Cruz-Pavía I, Gervain J, Vatikiotis-Bateson E, Werker JF. Coverbal speech gestures signal phrase boundaries: A production study of Japanese and English infant- and adult-directed speech. Lang Acquis. Early view September 2019. doi: 10.1080/10489223.2018.1470242

13. Morgan JL, Meier RP, Newport EL. Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cogn Psychol. 1987; 19: 498–550. doi: 10.1016/0010-0285(87)90017-x 3677585

14. Cavé C, Guaïtella I, Bertrand R, Santi S, Harlay F, Espesser R. About the relationship between eyebrow movements and F0 variations. In: Proc Int Conf Spok Lang Process; 4. IEEE; 1996. pp. 2175–2178.

15. Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychol Sci. 2004; 15(2): 133–137. doi: 10.1111/j.0963-7214.2004.01502010.x 14738521

16. Yehia HC, Kuratate T, Vatikiotis-Bateson E. Linking facial animation, head motion and speech acoustics. J Phon. 2002; 30(3): 555–568. doi: 10.1006/jpho.2002.0165

17. Al Moubayed S, Beskow J, Granström B. Auditory visual prominence: From intelligibility to behavior. J Multimodal User In. 2010; 3(4): 299–309. doi: 10.1007/s12193-010-0054-0

18. Swerts M, Krahmer E. Facial expression and prosodic prominence: Effects of modality and facial area. J Phon. 2008; 36(2): 219–238. doi: 10.1016/j.wocn.2007.05.001

19. Dohen M, Lœvenbruck H. Interaction of Audition and Vision for the Perception of Prosodic Contrastive Focus. Lang Speech. 2009; 52(2–3): 177–206. doi: 10.1177/0023830909103166 19624029

20. Granström B, House D. Audiovisual representation of prosody in expressive speech communication. Speech Commun. 2005; 46(3): 473–484. doi: 10.1016/j.specom.2005.02.017

21. Krahmer E, Swerts M. The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. J Mem Lang. 2007; 57(3): 396–414. doi: 10.1016/j.jml.2007.06.005

22. Mixdorff H, Hönemann A, Fagel S. Integration of Acoustic and Visual Cues in Prominence Perception. In: Oumi S, Berthommier F, Jesse A, editors. Proc AVSP; 2013. Retrieved from

23. de la Cruz-Pavía I, Werker JF, Vatikiotis-Bateson E, Gervain J. Finding phrases: The interplay of word frequency, phrasal prosody and co-speech visual information in chunking speech by monolingual and bilingual adults. Lang Speech. Early view April 2019. doi: 10.1177/0023830919842353 31002280

24. Prieto P, Puglesi C, Borràs-Comes J, Arroyo E, Blat J. Exploring the contribution of prosody and gesture to the perception of focus using an animated agent. J Phon. 2015; 49: 41–54. doi: 10.1016/j.wocn.2014.10.005

25. Morgan JL, Saffran JR. Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. Child Dev. 1995; 66(4): 911–936. doi: 10.2307/1131789 7671658

26. Dutoit T. An introduction to text-to-speech synthesis. Dordrecht: Kluwer; 1997.

27. Nelson DGK, Jusczyk PW, Mandel DR, Myers J, Turk A, Gerken L. The head-turn preference procedure for testing auditory perception. Infant Behav Dev. 1995; 18(1): 111–116. doi: 10.1016/0163-6383(95)90012-8

28. González-Gómez N, Schmandt S, Fazekas J, Nazzi T, Gervain J. Infants’ sensitivity to nonadjacent vowel dependencies: The case of vowel harmony in Hungarian. J Exp Child Psychol. 2019; 178: 170–183. doi: 10.1016/j.jecp.2018.08.014 30380456

29. Cohen JD, MacWhinney B, Flatt M, Provost J. PsyScope: A new graphic interactive environment for designing psychology experiments. Behav Res Meth Ins C. 1993; 25(2): 257–271. doi: 10.3758/BF03204507

30. Danielson DK, Bruderer AG, Kandhadai P, Vatikiotis-Bateson E, Werker JF. The organization and reorganization of audiovisual speech perception in the first year of life. Cogn Dev. 2017; 42: 37–48. doi: 10.1016/j.cogdev.2017.02.004 28970650

31. Hollich G, Newman RS, Jusczyk PW. Infants’ use of synchronized visual information to separate streams of speech. Child Dev. 2005; 76(3). doi: 10.1111/j.1467-8624.2005.00866.x 15892781

32. Lewkowicz DJ. Perception of auditory–visual temporal synchrony in human infants. J Exp Psychol Hum Percept Perform. 1996; 22(5): 1094. doi: 10.1037//0096-1523.22.5.1094 8865617

33. Kim J, Cvejic E, Davis C. Tracking eyebrows and head gestures associated with spoken prosody. Speech Commun. 2014; 57: 317–330. doi: 10.1016/j.specom.2013.06.003

34. Kuhl PK, Meltzoff AN. The bimodal perception of speech in infancy. Science. 1982; 218(4577): 1138–1141. doi: 10.1126/science.7146899 7146899

35. Patterson ML, Werker JF. Two-month-old infants match phonetic information in lips and voice. Dev Sci. 2003; 6(2): 191–196. doi: 10.1111/1467-7687.00271

36. Kubicek C, Hillairet de Boisferon A, Dupierrix E, Pascalis O, Lœvenbruck H, Gervain J, et al. Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy. PLOS ONE. 2014; 9(2): E89275. doi: 10.1371/journal.pone.0089275 24586651

37. Lewkowicz DJ, Hansen-Tift AM. Infants deploy selective attention to the mouth of a talking face when learning speech. PNAS. 2012; 109(5): 1431–1436. doi: 10.1073/pnas.1114783109 22307596

38. Shaw K, Baart M, Depowski N, Bortfeld H. Infants’ Preference for Native Audiovisual Speech Dissociated from Congruency Preference. PLOS ONE. 2015; 10:E0126059. doi: 10.1371/journal.pone.0126059 25927529

39. Teinonen T, Aslin RN, Alku P, Csibra G. Visual speech contributes to phonetic learning in 6-month-old infants. Cognition. 2008; 108(3): 850–855. doi: 10.1016/j.cognition.2008.05.009 18590910

40. Weikum WM, Vouloumanos A, Navarra J, Soto-Faraco S, Sebastián-Gallés N, Werker JF. Visual Language Discrimination in Infancy. Science. 2007; 316(5828): 1159–1159. doi: 10.1126/science.1137686 17525331

41. Dohen M, Lœvenbruck H, Harold H. Visual correlates of prosodic contrastive focus in French: description and inter-speaker variability. In Proc Speech Pros. Dresden; 2006. pp. 221–224.

42. Ishi CT, Ishiguro H, Hagita N. Analysis of relationship between head motion events and speech in dialogue conversations. Speech Commun. 2014; 57: 233–243. doi: 10.1016/j.specom.2013.06.008

43. Scarborough R, Keating P, Mattys SL, Cho T, Alwan A. Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English. Lang Speech. 2009; 52(2–3): 135–175. doi: 10.1177/0023830909103165 19624028

44. Thiessen ED. Effects of visual information on adults’ and infants’ auditory statistical learning. Cogn Sci. 2010; 34: 1093–1106. doi: 10.1111/j.1551-6709.2010.01118.x 21564244

45. Esteve-Gibert N, Prieto P, Pons F. Nine-month-old infants are sensitive to the temporal alignment of prosodic and gesture prominences. Infant Behav Dev. 2015; 38: 126–129. doi: 10.1016/j.infbeh.2014.12.016 25656953

46. Cunillera T, Càmara E, Laine M, Rodríguez-Fornells A. Speech segmentation is facilitated by visual cues. Q J Exp Psychol. 2010; 63(2): 260–274. doi: 10.1080/17470210902888809 19526435

47. Mitchel AD, Weiss DJ. Visual speech segmentation: using facial cues to locate word boundaries in continuous speech. Lang Cogn Neurosci. 2014; 29(7): 771–780. doi: 10.1080/01690965.2013.791703 25018577

Článek vyšel v časopise


2019 Číslo 11