Spelling performance on the web and in the lab

Autoři: Arnaud Rey aff001;  Jean-Luc Manguin aff003;  Chloé Olivier aff001;  Sébastien Pacton aff004;  Pierre Courrieu aff001
Působiště autorů: Laboratoire de Psychologie Cognitive, CNRS—Aix-Marseille Université, Marseille, France aff001;  Institute of Language, Communication and the Brain, Aix-Marseille Université, Marseille, France aff002;  GREYC, CNRS—Université de Caen Basse-Normandie–ENSICAEN, Caen, France aff003;  Laboratoire Mémoire, Cerveau et Cognition, Université Paris Descartes, Paris, France aff004
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: 10.1371/journal.pone.0226647


Several dictionary websites are available on the web to access semantic, synonymous, or spelling information about a given word. During nine years, we systematically recorded all the entered letter sequences from a French web dictionary. A total of 200 million orthographic forms were obtained allowing us to create a large-scale database of spelling errors that could inform psychological theories about spelling processes. To check the reliability of this big data methodology, we selected from this database a sample of 100 frequently misspelled words. A group of 100 French university students had to perform a spelling-to-dictation test on this list of words. The results showed a strong correlation between the two data sets on the frequencies of produced spellings (r = 0.82). Although the distributions of spelling errors were relatively consistent across the two databases, the proportion of correct responses revealed significant differences. Regression analyses allowed us to generate possible explanations for these differences in terms of task-dependent factors. We argue that comparing the results of these large-scale databases with those of standard and controlled experimental paradigms is certainly a good way to determine the conditions under which this big data methodology can be adequately used for informing psychological theories.

Klíčová slova:

Database and informatics methods – Experimental psychology – Information retrieval – Lexicons – Phonemes – Phonology – Regression analysis – Semantics


1. Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live, work and think. John Murray.; 2013.

2. Houghton G, Zorzi M. Normal and impaired spelling in a connectionist dual-route architecture. Cogn Neuropsychol. 2003;20: 115–162. doi: 10.1080/02643290242000871 20957568

3. Krevisky J, Linfield JL. The Bad Spellers Dictionary. Random House Reference; 1974.

4. Rey A, Courrieu P, Schmidt-Weigand F, Jacobs AM. Item performance in visual word recognition. Psychon Bull Rev. 2009;16: 600–608. doi: 10.3758/PBR.16.3.600 19451391

5. Rey A, Courrieu P. Accounting for Item Variance in Large-scale Databases. Front Psychol. 2010;1. doi: 10.3389/fpsyg.2010.00200 21738520

6. Spieler DH, Balota DA. Bringing Computational Models of Word Naming Down to the Item Level. Psychol Sci. 1997;8: 411–416. doi: 10.1111/j.1467-9280.1997.tb00453.x

7. Courrieu P, Rey A. Missing data imputation and corrected statistics for large-scale behavioral databases. Behav Res Methods. 2011;43: 310–330. doi: 10.3758/s13428-011-0071-2 21424187

8. Courrieu P, Brand-D’abrescia M, Peereman R, Spieler D, Rey A. Validated intraclass correlation statistics to test item performance models. Behav Res Methods. 2011;43: 37–55. doi: 10.3758/s13428-010-0020-5 21287127

9. Perry C, Ziegler JC, Zorzi M. Beyond single syllables: Large-scale modeling of reading aloud with the Connectionist Dual Process (CDP++) model. Cognit Psychol. 2010;61: 106–151. doi: 10.1016/j.cogpsych.2010.04.001 20510406

10. Véronis J. From sound to spelling in French: Simulation on a computer. Eur Bull Cogn Psychol. 1988;8: 315–334.

11. Peereman R, Lété B, Sprenger-Charolles L. Manulex-infra: Distributional characteristics of grapheme—phoneme mappings, and infralexical and lexical units in child-directed written material. Behav Res Methods. 2007;39: 579–589. doi: 10.3758/bf03193029 17958171

12. Ziegler JC, Jacobs AM, Stone GO. Statistical analysis of the bidirectional inconsistency of spelling and sound in French. Behav Res Methods Instrum Comput. 1996;28: 504–515. doi: 10.3758/BF03200539

13. Ziegler JC, Stone GO, Jacobs AM. What is the pronunciation for _OUGH and the spelling for /u/? A database for computing feedforward and feedback consistency in English. Behav Res Methods Instrum Comput. 1997;29: 600–618. doi: 10.3758/BF03210615

14. Manesse D, Chervel A, Cogis D. Orthographe: A qui la faute? Paris: ESF; 2007.

15. Gingras M, Sénéchal M. Silex: A database for silent-letter endings in French words. Behav Res Methods. 2017;49: 1894–1904. doi: 10.3758/s13428-016-0832-z 27864813

16. Sénéchal M, Gingras M, L’Heureux L. Modeling Spelling Acquisition: The Effect of Orthographic Regularities on Silent-Letter Representations: Scientific Studies of Reading: Vol 20, No 2. 2015;20: 155–162. doi: doi.org/10.1080/10888438.2015.1098650

17. Pacton S, Deacon H. The timing and mechanisms of children’s use of morphological information in spelling: A review of evidence from English and French. Cogn Dev. 2008;23: 339–359. doi: 10.1016/j.cogdev.2007.09.004

18. CRISCO. Dictionnaire électronique des synonymes. 1998. Available: http://www.crisco.unicaen.fr/des/

19. Manguin J-L. Les requêtes sur un site Web: un corpus pour étudier la variation orthographique (in French). Proceedings of 6èmes journées de linguistique de corpus. Bretagne Sud University, Lorient; 2009.

20. Romary L, Salmon-Alt S, Francopoulo G. Standards going concrete: from LMF to Morphalou. Geneva, Switzerland; 2004.

21. MORPHALOU. Lexique morphologique ouvert du français. Available: http://www.cnrtl.fr/lexiques/morphalou/

22. Damerau FJ. A Technique for Computer Detection and Correction of Spelling Errors. Commun ACM. 1964;7: 171–176. doi: 10.1145/363958.363994

23. New B, Pallier C, Brysbaert M, Ferrand L. Lexique 2: A new French lexical database. Behav Res Methods Instrum Comput. 2004;36: 516–524. doi: 10.3758/bf03195598 15641440

24. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1: 30–46.

25. Williams EJ. The Comparison of Regression Variables. J R Stat Soc Ser B Methodol. 1959;21: 396–399. doi: 10.1111/j.2517-6161.1959.tb00346.x

26. Steiger JH. Tests for comparing elements of a correlation matrix. Psychol Bull. 87: 245–251. doi: 10.1037/0033-2909.87.2.245

27. Lété B, Peereman R, Fayol M. Consistency and word-frequency effects on spelling among first- to fifth-grade French children: A regression-based study. J Mem Lang. 2008;58: 952–977. doi: 10.1016/j.jml.2008.01.001

28. Lété B, Sprenger-Charolles L, Colé P. MANULEX: A grade-level lexical database from French elementary school readers. Behav Res Methods Instrum Comput. 2004;36: 156–166. doi: 10.3758/bf03195560 15190710

29. Coltheart M, Davelaar E, Jonasson JT, Besner D. Access to the internal lexicon. Dornic S (Ed.). Attention and Performance. Dornic S (Ed.). New York: Academic Press; 1977. pp. 535–555.

30. Pacton S, Sobaco A, Fayol M, Treiman R. How does graphotactic knowledge influence children’s learning of new spellings? Front Psychol. 2013;4. doi: 10.3389/fpsyg.2013.00701 24109466

31. Sobaco A, Treiman R, Peereman R, Borchardt G, Pacton S. The influence of graphotactic knowledge on adults’ learning of spelling. Mem Cognit. 2015;43: 593–604. doi: 10.3758/s13421-014-0494-y 25537953

32. Bar-On A, Kuperman V. Spelling errors respect morphology: a corpus study of Hebrew orthography. Read Writ. 2019;32: 1107–1128. doi: 10.1007/s11145-018-9902-1

33. Schmitz T, Chamalaun R, Ernestus M. The Dutch verb-spelling paradox in social media. Linguist Neth. 2018;35: 111–124. doi: 10.1075/avt.00008.sch

34. Pacton S, Fayol M, Nys M, Peereman R. Implicit Statistical Learning of Graphotactic Knowledge and Lexical Orthographic Acquisition. Spell Writ Words. 2019; 41–66. doi: 10.1163/9789004394988_004

Článek vyšel v časopise


2019 Číslo 12