Rapid visual categorization is not guided by early salience-based selection

Autoři: John K. Tsotsos aff001;  Iuliia Kotseruba aff001;  Calden Wloka aff001
Působiště autorů: Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada aff001
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0224306


The current dominant visual processing paradigm in both human and machine research is the feedforward, layered hierarchy of neural-like processing elements. Within this paradigm, visual saliency is seen by many to have a specific role, namely that of early selection. Early selection is thought to enable very fast visual performance by limiting processing to only the most salient candidate portions of an image. This strategy has led to a plethora of saliency algorithms that have indeed improved processing time efficiency in machine algorithms, which in turn have strengthened the suggestion that human vision also employs a similar early selection strategy. However, at least one set of critical tests of this idea has never been performed with respect to the role of early selection in human vision. How would the best of the current saliency models perform on the stimuli used by experimentalists who first provided evidence for this visual processing paradigm? Would the algorithms really provide correct candidate sub-images to enable fast categorization on those same images? Do humans really need this early selection for their impressive performance? Here, we report on a new series of tests of these questions whose results suggest that it is quite unlikely that such an early selection process has any role in human rapid visual categorization.

Klíčová slova:

Algorithms – Behavior – Computer vision – Eye movements – Eyes – Human performance – Vision – Visual system


1. Rosenblatt F. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Washington DC: Spartan; 1965.

2. Fukushima K. Cognitron: A self-organizing multilayered neural network. Biological cybernetics. 1975;20(3-4):121–136. doi: 10.1007/bf00342633 1203338

3. Fukushima K, Miyake S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In: Competition and cooperation in neural nets. Springer; 1982. p. 267–285.

4. Rumelhart DE, McClelland JL. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge MA: MIT Press; 1986.

5. LeCun Y, Bengio Y. Convolutional networks for images, speech, and time series. In: Arbib MA, editor. The handbook of brain theory and neural networks. Cambridge MA: MIT Press; 1995. p. 255–258.

6. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems; 2012. p. 1097–1105.

7. Tsotsos JK. A ‘complexity level’ analysis of immediate vision. International Journal of Computer Vision. 1988;1(4):303–320. doi: 10.1007/BF00133569

8. Potter MC, Levy EI. Recognition memory for a rapid sequence of pictures. Journal of experimental psychology. 1969;81(1):10–15. doi: 10.1037/h0027470 5812164

9. Potter MC, Faulconer BA. Time to understand pictures and words. Nature. 1975;253(5491):437–438. doi: 10.1038/253437a0 1110787

10. Potter MC. Meaning in visual search. Science. 1975;187(4180):965–966.

11. Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. nature. 1996;381(6582):520–522. doi: 10.1038/381520a0 8632824

12. Potter MC, Wyble B, Hagmann CE, McCourt ES. Detecting meaning in RSVP at 13 ms per picture. Attention, Perception, & Psychophysics. 2014;76(2):270–279. doi: 10.3758/s13414-013-0605-z

13. Feldman JA, Ballard DH. Connectionist models and their properties. Cognitive science. 1982;6(3):205–254. doi: 10.1207/s15516709cog0603_1

14. Fukushima K. A neural network model for selective attention in visual pattern recognition. Biological Cybernetics. 1986;55(1):5–15. doi: 10.1007/bf00363973 3801530

15. Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735 9377276

16. Sutskever I. Training Recurrent Neural Networks [PhD Thesis]. University of Toronto; 2012.

17. Tsotsos JK. The complexity of perceptual search tasks. In: Proceedings of 11th International Joint Conference on Artificial Intelligence. vol. 89; 1989. p. 1571–1577.

18. Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of intelligence. Springer; 1987. p. 115–141.

19. Treisman AM, Gelade G. A feature-integration theory of attention. Cognitive psychology. 1980;12(1):97–136. doi: 10.1016/0010-0285(80)90005-5 7351125

20. Broadbent D. Perception and communication. Pergamon Press, NY; 1958.

21. Deutsch JA, Deutsch D. Attention: Some theoretical considerations. Psychological review. 1963;70(1):80–90. doi: 10.1037/h0039515 14027390

22. Mackay DG. Aspects of the theory of comprehension, memory and attention. Quarterly Journal of Experimental Psychology. 1973;25(1):22–40. doi: 10.1080/14640747308400320

23. Moray N. Attention: Selective processes in vision and hearing. London, Hutchinson Educational; 1969.

24. Norman DA. Toward a theory of memory and attention. Psychological review. 1968;75(6):522–536. doi: 10.1037/h0026699

25. Treisman AM. The effect of irrelevant material on the efficiency of selective listening. The American Journal of Psychology. 1964;77(4):533–546. doi: 10.2307/1420765 14251963

26. Clark JJ, Ferrier NJ. Modal control of an attentive vision system. In: Proceedings of the Second IEEE International Conference on Computer Vision; 1988. p. 514–523.

27. Sandon PA. Simulating visual attention. Journal of Cognitive Neuroscience. 1990;2(3):213–231. doi: 10.1162/jocn.1990.2.3.213 23972045

28. Culhane SM, Tsotsos JK. An attentional prototype for early vision. In: Proceedings of the European Conference on Computer Vision; 1992. p. 551–560.

29. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(11):1254–1259. doi: 10.1109/34.730558

30. Bylinskii Z, DeGennaro EM, Rajalingham R, Ruda H, Zhang J, Tsotsos JK. Towards the quantitative evaluation of visual attention models. Vision research. 2015;116:258–268. doi: 10.1016/j.visres.2015.04.007 25951756

31. Bruce ND, Wloka C, Frosst N, Rahman S, Tsotsos JK. On computational modeling of visual saliency: Examining what’s right, and what’s left. Vision research. 2015;116:95–112. doi: 10.1016/j.visres.2015.01.010 25666489

32. Bylinskii Z, Judd T, Oliva A, Torralba A, Durand F. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence. 2018;. doi: 10.1109/TPAMI.2018.2815601 29993800

33. Tsotsos JK, Eckstein MP, Landy MS. Computational models of visual attention. Vision research. 2015;116(Pt B):93. doi: 10.1016/j.visres.2015.09.007 26420739

34. Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention. arXiv preprint arXiv:14127755. 2014;.

35. Zhang J, Bargal SA, Lin Z, Brandt J, Shen X, Sclaroff S. Top-down neural attention by excitation backprop. International Journal of Computer Vision. 2018;126(10):1084–1102. doi: 10.1007/s11263-017-1059-x

36. Shashua A, Ullman S. Structural Saliency: The Detection Of Globally Salient Structures using A Locally Connected Network. In: Proceedings of IEEE International Conference on Computer Vision; 1988. p. 321–327.

37. Olshausen BA, Anderson CH, Van Essen DC. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. Journal of Neuroscience. 1993;13(11):4700–4719. doi: 10.1523/JNEUROSCI.13-11-04700.1993 8229193

38. Itti L, Koch C. Computational modelling of visual attention. Nature reviews neuroscience. 2001;2(3):194. doi: 10.1038/35058500 11256080

39. Walther D, Itti L, Riesenhuber M, Poggio T, Koch C. Attentional selection for object recognition—a gentle way. In: International Workshop on Biologically Motivated Computer Vision; 2002. p. 472–479.

40. Li Z. A saliency map in primary visual cortex. Trends in cognitive sciences. 2002;6(1):9–16. doi: 10.1016/S1364-6613(00)01817-9 11849610

41. Zhaoping L. Understanding vision: theory, models, and data. Oxford University Press, USA; 2014.

42. Deco G, Rolls ET. A neurodynamical cortical model of visual attention and invariant object recognition. Vision research. 2004;44(6):621–642. doi: 10.1016/j.visres.2003.09.037 14693189

43. Itti L. Models of bottom-up attention and saliency. In: Neurobiology of attention. Elsevier; 2005. p. 576–582.

44. Chikkerur S, Serre T, Tan C, Poggio T. What and where: A Bayesian inference theory of attention. Vision research. 2010;50(22):2233–2247. doi: 10.1016/j.visres.2010.05.013 20493206

45. Zhang Y, Meyers EM, Bichot NP, Serre T, Poggio TA, Desimone R. Object decoding with attention in inferior temporal cortex. Proceedings of the National Academy of Sciences. 2011;108(21):8850–8855. doi: 10.1073/pnas.1100999108

46. Buschman TJ, Kastner S. From behavior to neural dynamics: an integrated theory of attention. Neuron. 2015;88(1):127–144. doi: 10.1016/j.neuron.2015.09.017 26447577

47. Yan Y, Zhaoping L, Li W. Bottom-up saliency and top-down learning in the primary visual cortex of monkeys. Proceedings of the National Academy of Sciences. 2018;115(41):10499–10504. doi: 10.1073/pnas.1803854115

48. Horwitz GD, Newsome WT. Separate signals for target selection and movement specification in the superior colliculus. Science. 1999;284(5417):1158–1161. doi: 10.1126/science.284.5417.1158 10325224

49. Kustov AA, Robinson DL. Shared neural control of attentional shifts and eye movements. Nature. 1996;384(6604):74. doi: 10.1038/384074a0 8900281

50. McPeek RM, Keller EL. Saccade target selection in the superior colliculus during a visual search task. Journal of neurophysiology. 2002;88(4):2019–2034. doi: 10.1152/jn.2002.88.4.2019 12364525

51. Koch C. A theoretical analysis of the electrical properties of an X-cell in the Cat’s LGN: Does the spine-triad circuit subserve selective visual attention. Artificial Intelligence Memo. 1984;787.

52. Sherman S, Koch C. The control of retinogeniculate transmission in the mammalian lateral geniculate nucleus. Experimental Brain Research. 1986;63(1):1–20. doi: 10.1007/bf00235642 3015651

53. Petersen SE, Robinson DL, Morris JD. Contributions of the pulvinar to visual spatial attention. Neuropsychologia. 1987;25(1):97–105. doi: 10.1016/0028-3932(87)90046-7 3574654

54. Posner MI, Petersen SE. The attention system of the human brain. Annual review of neuroscience. 1990;13(1):25–42. doi: 10.1146/annurev.ne.13.030190.000325 2183676

55. Robinson DL, Petersen SE. The pulvinar and visual salience. Trends in Neurosciences. 1992;15(4):127–132. doi: 10.1016/0166-2236(92)90354-b 1374970

56. Thompson KG, Bichot NP, Schall JD. Dissociation of visual discrimination from saccade programming in macaque frontal eye field. Journal of neurophysiology. 1997;77(2):1046–1050. doi: 10.1152/jn.1997.77.2.1046 9065870

57. Gottlieb JP, Kusunoki M, Goldberg ME. The representation of visual salience in monkey parietal cortex. Nature. 1998;391(6666):481. doi: 10.1038/35135 9461214

58. Bruce ND, Tsotsos JK. Saliency, attention, and visual search: An information theoretic approach. Journal of vision. 2009;9(3):5–5. doi: 10.1167/9.3.5 19757944

59. Bylinskii Z, Judd T, Borji A, Itti L, Durand F, Oliva A, et al. MIT saliency benchmark; 2015.

60. Huang X, Shen C, Boix X, Zhao Q. SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 262–270.

61. Zhang J, Sclaroff S. Saliency detection: A boolean map approach. In: Proceedings of the IEEE international conference on computer vision; 2013. p. 153–160.

62. Vig E, Dorr M, Cox D. Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 2798–2805.

63. Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T. Rare2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Processing: Image Communication. 2013;28(6):642–658.

64. Kümmerer M, Wallis TS, Bethge M. DeepGaze II: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:161001563. 2016;.

65. Judd T, Ehinger K, Durand F, Torralba A. Learning to predict where humans look. In: Proceedings of the IEEE International Conference on Computer Vision; 2009. p. 2106–2113.

66. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014;.

67. Alexe B, Deselaers T, Ferrari V. What is an object? In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on; 2010. p. 73–80.

68. Borji A, Cheng MM, Jiang H, Li J. Salient object detection: A benchmark. IEEE transactions on image processing. 2015;24(12):5706–5722. doi: 10.1109/TIP.2015.2487833 26452281

69. Thorpe SJ, Imbert M. Biological constraints on connectionist modelling. Connectionism in perspective. 1989; p. 63–92.

70. Tsotsos JK, Culhane SM, Wai WYK, Lai Y, Davis N, Nuflo F. Modeling visual attention via selective tuning. Artificial Intelligence. 1995;78(1-2):507–545. doi: 10.1016/0004-3702(95)00025-9

71. van der Heijden AH, Schreuder R, Wolters G. Enhancing single-item recognition accuracy by cueing spatial locations in vision. The Quarterly Journal of Experimental Psychology Section A. 1985;37(3):427–434. doi: 10.1080/14640748508400943

72. Fabre-Thorpe M, Richard G, Thorpe SJ. Rapid categorization of natural images by rhesus monkeys. Neuroreport. 1998;9(2):303–308. doi: 10.1097/00001756-199801260-00023 9507973

73. Herzog MH, Clarke AM. Why vision is not both hierarchical and feedforward. Frontiers in computational neuroscience. 2014;8:135. doi: 10.3389/fncom.2014.00135 25374535

74. Tsotsos J, Kotseruba I, Wloka C. A focus on selection for fixation. Journal of Eye Movement Research. 2016;9(5).

75. Wloka C, Kotseruba I, Tsotsos JK. Active fixation control to predict saccade sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 3184–3193.

Článek vyšel v časopise


2019 Číslo 10