Does training with amplitude modulated tones affect tone-vocoded speech perception?

Autoři: Aina Casaponsa aff001;  Ediz Sohoglu aff001;  David R. Moore aff001;  Christian Füllgrabe aff001;  Katharine Molloy aff001;  Sygal Amitay aff001
Působiště autorů: Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom aff001;  Department of Linguistics and English Language, Lancaster University, Lancaster, England, United Kingdom aff002
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: 10.1371/journal.pone.0226288


Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored.

Klíčová slova:

Consonants – Learning – Perceptual learning – Phonology – Psychophysics – Speech – Speech signal processing – Syllables


1. Azadpour M, Balaban E. A proposed mechanism for rapid adaptation to spectrally distorted speech. J Acoust Soc Am. 2015;138(1):44–57. doi: 10.1121/1.4922226 26233005

2. Drullman R. Temporal envelope and fine structure cues for speech intelligibility. J Acoust Soc Am. 1995;97(1):585–92. doi: 10.1121/1.413112 7860835

3. Drullman R, Festen JM, Plomp R. Effect of reducing slow temporal modulations on speech reception. J Acoust Soc Am. 1994;95(5):2670–80.

4. Samuel AG, Kraljic T. Perceptual learning for speech. Atten Percept Psychophys. 2009;71(6):1207–18. doi: 10.3758/APP.71.6.1207 19633336

5. Samuel AG. Speech Perception. Annu Rev Psychol. 2011;62(1):49–72.

6. Shannon RV, Fu Q-J, Galvin J 3rd. The number of spectral channels required for speech recognition depends on the difficulty of the listening situation. Acta oto-laryngol. 2004;(552):50–4.

7. Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science. 1995;270(5234):303–4. doi: 10.1126/science.270.5234.303 7569981

8. Souza P, Rosen S. Effects of envelope bandwidth on the intelligibility of sine-and noise-vocoded speech. J Acoust Soc Am. 2009;126(2):792–805. doi: 10.1121/1.3158835 19640044

9. Stone MA, Moore BC. Effects of spectro-temporal modulation changes produced by multi-channel compression on intelligibility in a competing-speech task. J Acoust Soc Am. 2008;123(2):1063–76. doi: 10.1121/1.2821969 18247908

10. Xu L, Pfingst BE. Spectral and temporal cues for speech recognition: implications for auditory prostheses. Hear Res. 2008;242(1–2):132–40. doi: 10.1016/j.heares.2007.12.010 18249077

11. Luo X, Fu Q-J, Wei C-G, Cao K-L. Speech recognition and temporal amplitude modulation processing by Mandarin-speaking cochlear implant users. Ear Hear. 2008;29(6):957. doi: 10.1097/AUD.0b013e3181888f61 18818548

12. Won JH, Drennan WR, Nie K, Jameyson EM, Rubinstein JT. Acoustic temporal modulation detection and speech perception in cochlear implant listeners. J Acoust Soc Am. 2011;130(1):376–88. doi: 10.1121/1.3592521 21786906

13. Cazals Y, Pelizzone M, Saudan O, Boex C. Low-pass filtering in amplitude-modulation detection associated with vowel and consonant identification in subjects with cochlear implants. J Acoust Soc Am. 1994;96(4):2048–54. doi: 10.1121/1.410146 7963020

14. Fu Q-J. Temporal processing and speech recognition in cochlear implant users. Neuroreport. 2002;13(13):1635–9. doi: 10.1097/00001756-200209160-00013 12352617

15. Xu L, Thompson CS, Pfingst BE. Relative contributions of spectral and temporal cues for phoneme recognition. J Acoust Soc Am. 2005;117(5):3255–67. doi: 10.1121/1.1886405 15957791

16. Fu Q-J, Shannon RV. Effect of stimulation rate on phoneme recognition by Nucleus-22 cochlear implant listeners. J Acoust Soc Am. 2000;107(1):589–97. doi: 10.1121/1.428325 10641667

17. Erb J, Henry MJ, Eisner F, Obleser J. Auditory skills and brain morphology predict individual differences in adaptation to degraded speech. Neuropsychologia. 2012;50(9):2154–64. doi: 10.1016/j.neuropsychologia.2012.05.013 22609577

18. Erb J, Henry MJ, Eisner F, Obleser J. The brain dynamics of rapid perceptual adaptation to adverse listening conditions. J Neurosci. 2013;33(26):10688–97. doi: 10.1523/JNEUROSCI.4596-12.2013 23804092

19. Lorenzi C, Dumont A, Füllgrabe C. Use of temporal envelope cues by children with developmental dyslexia. J Speech Lang Hear Res. 2000;43(6):1367–79. doi: 10.1044/jslhr.4306.1367 11193958

20. Stone MA, Füllgrabe C, Moore BC. Relative contribution to speech intelligibility of different envelope modulation rates within the speech dynamic range. J Acoust Soc Am. 2010;128(4):2127–37. doi: 10.1121/1.3479546 20968383

21. Stone MA, Füllgrabe C, Moore B. Benefit of high-rate envelope cues in vocoder processing: effect of number of channels and spectral region. J Acoust Soc Am. 2008;124(4):2272–82. doi: 10.1121/1.2968678 19062865

22. McGettigan C, Rosen S, Scott SK. Lexico-semantic and acoustic-phonetic processes in the perception of noise-vocoded speech: implications for cochlear implantation. Front Syst Neurosci. 2014;8:18. doi: 10.3389/fnsys.2014.00018 24616669

23. Davis MH, Johnsrude IS, Hervais-Adelman A, Taylor K, McGettigan C. Lexical information drives perceptual learning of distorted speech: evidence from the comprehension of noise-vocoded sentences. J Exp Psychol Gen. 2005;134(2):222–41. doi: 10.1037/0096-3445.134.2.222 15869347

24. Hervais-Adelman A, Davis MH, Johnsrude IS, Carlyon RP. Perceptual learning of noise vocoded words: effects of feedback and lexicality. J Exp Psychol Hum Percept Perform. 2008;34(2):460–74. doi: 10.1037/0096-1523.34.2.460 18377182

25. Hervais-Adelman AG, Davis MH, Johnsrude IS, Taylor KJ, Carlyon RP. Generalization of perceptual learning of vocoded speech. J Exp Psychol Hum Percept Perform. 2011;37(1):283. doi: 10.1037/a0020772 21077718

26. Loebach JL, Pisoni DB. Perceptual learning of spectrally degraded speech and environmental sounds. J Acoust Soc Am. 2008;123(2):1126–39. doi: 10.1121/1.2823453 18247913

27. Loebach JL, Wickesberg RE. The psychoacoustics of noise vocoded speech: a physiological means to a perceptual end. Hear Res. 2008;241(1–2):87–96. doi: 10.1016/j.heares.2008.05.002 18556159

28. Loebach JL, Pisoni DB, Svirsky MA. Transfer of auditory perceptual learning with spectrally reduced speech to speech and nonspeech tasks: Implications for cochlear implants. Ear Hear. 2009;30(6):662–74. doi: 10.1097/AUD.0b013e3181b9c92d 19773659

29. Maidment DW, Kang H, Gill EC, Amitay S. Acquisition versus consolidation of auditory perceptual learning using mixed-training regimens. PLoS One. 2015;10(3):e0121953. doi: 10.1371/journal.pone.0121953 25803429

30. Fitzgerald MB, Wright BA. Perceptual learning and generalization resulting from training on an auditory amplitude-modulation detection task. J Acoust Soc Am. 2011;129(2):898–906. doi: 10.1121/1.3531841 21361447

31. Fitzgerald MB, Wright BA. A perceptual learning investigation of the pitch elicited by amplitude-modulated noise. J Acoust Soc Am. 2005;118(6):3794–803. doi: 10.1121/1.2074687 16419824

32. Wright BA, Zhang Y. A review of the generalization of auditory learning. Philos Trans R Soc Lond B Biol Sci. 2009;364(1515):301–11. doi: 10.1098/rstb.2008.0262 18977731

33. Merzenich MM, Jenkins WM, Johnston P, Schreiner C. Temporal processing deficits of language-learning impaired children ameliorated by training. Science. 1996; 271(5245):77. doi: 10.1126/science.271.5245.77 8539603

34. Füllgrabe C, Moore BC, Stone MA. Age-group differences in speech identification despite matched audiometrically normal hearing: contributions from auditory temporal processing and cognition. Front Aging Neurosci. 2015;6:347. doi: 10.3389/fnagi.2014.00347 25628563

35. Wechsler D. Wechsler abbreviated scale of intelligence: Psychological Corporation; 1999.

36. McGettigan C. Factors affecting the perception of noise-vocoded speech: stimulus properties and listener variability [PhD Thesis]: University College London; 2008.

37. Amitay S, Hawkey DJ, Moore DR. Auditory frequency discrimination learning is affected by stimulus variability. Percept Psychophys. 2005;67(4):691–8. doi: 10.3758/bf03193525 16134462

38. Stacey PC, Summerfield AQ. Effectiveness of computer-based auditory training in improving the perception of noise-vocoded speech. J Acoust Soc Am. 2007;121(5):2923.

39. Lakshminarayanan K, Tallal P. Generalization of non-linguistic auditory perceptual training to syllable discrimination. Restor Neurol Neuros. 2007;25(3):263–72.

40. Whitmal NA, Poissant SF, Freyman RL, Helfer KS. Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience. J Acoust Soc Am. 2007;122(4):2376–88. doi: 10.1121/1.2773993 17902872

41. Greenberg S, Carvey H, Hitchcock L, Chang S. Temporal properties of spontaneous speech—a syllable-centric perspective. J Phonetics. 2003;31(3):465–85.

42. Houtgast T, Steeneken HJ. A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J Acoust Soc Am. 1985;77(3):1069–77.

43. Leong V, Stone MA, Turner RE, Goswami U. A role for amplitude modulation phase relationships in speech rhythm perception. J Acoust Soc Am. 2014;136(1):366–81. doi: 10.1121/1.4883366 24993221

44. Leong V, Goswami U. Assessment of rhythmic entrainment at multiple timescales in dyslexia: evidence for disruption to syllable timing. Hear Res. 2014;308:141–61. doi: 10.1016/j.heares.2013.07.015 23916752

45. Viemeister NF. Temporal modulation transfer functions based upon modulation thresholds. J Acoust Soc Am. 1979;66(5):1364–80. doi: 10.1121/1.383531 500975

46. Levitt H. Transformed up‐down methods in psychoacoustics. J Acoust Soc Am. 1971;49(2B):467–77.

47. Bates D, Maechler M, Bolker B, Walker S, Christensen RHB, Singmann H, et al. lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1–9. 2015.

48. R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2015.

49. Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest: Tests for random and fixed effects for linear mixed effect models (lmer objects of lme4 package). R package version. 2013;2(6).

50. Jaeger TF. Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. J Mem Lang. 2008;59(4):434–46. doi: 10.1016/j.jml.2007.11.007 19884961

51. Fox J, Weisberg S. An R companion to Applied Regression. R package version 20–10: Sage; 2011.

52. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang. 2013;68(3):255–78.

53. Barr DJ. Random effects structure for testing interactions in linear mixed-effects models. Front Psychol. 2013;4.

54. Wichmann FA, Hill NJ. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept Psychophys. 2001;63(8):1293–313. doi: 10.3758/bf03194544 11800458

55. Miller GA, Nicely PE. An analysis of perceptual confusions among some English consonants. J Acoust Soc Am. 1955;27(2):338–52.

56. Rosen S. Temporal information in speech: acoustic, auditory and linguistic aspects. Philos T Roy Soc A. 1992;336(1278):367–73.

57. Shannon RV, Fu Q-J, Galvin J, Friesen L. Speech perception with cochlear implants. Cochlear implants: auditory prostheses and electric hearing: Springer; 2004. p. 334–76.

58. Füllgrabe C, Berthommier F, Lorenzi C. Masking release for consonant features in temporally fluctuating background noise. Hear Res. 2006;211(1):74–84.

59. Sagi E, Svirsky MA. Information transfer analysis: A first look at estimation bias. J Acoust Soc Am. 2008;123(5):2848–57. doi: 10.1121/1.2897914 18529200

60. Dorman MF, Loizou PC, Rainey D. Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. J Acoust Soc Am. 1997;102(4):2403–11. doi: 10.1121/1.419603 9348698

61. Wright BA, Fitzgerald MB. Learning and generalization on five basic auditory discrimination tasks as assessed by threshold changes. Auditory Signal Processing: Springer; 2005. p. 509–15.

62. Hawkey DJ, Amitay S, Moore DR. Early and rapid perceptual learning. Nat Neurosci. 2004;7(10):1055–6. doi: 10.1038/nn1315 15361880

63. Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev.2009;16(2):225–37. doi: 10.3758/PBR.16.2.225 19293088

64. Molloy K, Moore DR, Sohoglu E, Amitay S. Less is more: latent learning is maximized by shorter training sessions in auditory perceptual learning. PloS One. 2012;7(5):e36929. doi: 10.1371/journal.pone.0036929 22606309

65. Sohoglu E, Peelle JE, Carlyon RP, Davis MH. Top-down influences of written text on perceived clarity of degraded speech. J Exp Psychol Hum Percept Perform. 2014;40(1):186. doi: 10.1037/a0033206 23750966

66. Sohoglu E, Davis MH. Perceptual learning of degraded speech by minimizing prediction error. Proc. Natl Acad Sci. 2016;113(12):E1747–56. doi: 10.1073/pnas.1523266113 26957596

67. Fu Q-J, Nogaki G, Galvin JJ III. Auditory training with spectrally shifted speech: implications for cochlear implant patient auditory rehabilitation. J Assoc Res Otolaryngol. 2005;6(2):180–9. doi: 10.1007/s10162-005-5061-6 15952053

68. Van Tasell DJ, Soli SD, Kirby VM, Widin GP. Speech waveform envelope cues for consonant recognition. J Acoust Soc Am. 1987;82(4):1152–61. doi: 10.1121/1.395251 3680774

69. Füllgrabe C, Stone MA, Moore BC. Contribution of very low amplitude-modulation rates to intelligibility in a competing-speech task. J Acoust Soc Am. 2009;125(3):1277–80. doi: 10.1121/1.3075591 19275283

Článek vyšel v časopise


2019 Číslo 12