Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the Mechanical Turk participant pool


Autoři: Jonathan Robinson aff001;  Cheskie Rosenzweig aff002;  Aaron J. Moss aff002;  Leib Litman aff004
Působiště autorů: Department of Computer Science, Lander College, Flushing, New York, United States of America aff001;  Prime Research Solutions, Queens, New York, United States of America aff002;  Department of Clinical Psychology, Columbia University, New York, New York, United States of America aff003;  Department of Psychology, Lander College, Flushing, New York, United States of America aff004
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: 10.1371/journal.pone.0226394

Souhrn

Mechanical Turk (MTurk) is a common source of research participants within the academic community. Despite MTurk’s utility and benefits over traditional subject pools some researchers have questioned whether it is sustainable. Specifically, some have asked whether MTurk workers are too familiar with manipulations and measures common in the social sciences, the result of many researchers relying on the same small participant pool. Here, we show that concerns about non-naivete on MTurk are due less to the MTurk platform itself and more to the way researchers use the platform. Specifically, we find that there are at least 250,000 MTurk workers worldwide and that a large majority of US workers are new to the platform each year and therefore relatively inexperienced as research participants. We describe how inexperienced workers are excluded from studies, in part, because of the worker reputation qualifications researchers commonly use. Then, we propose and evaluate an alternative approach to sampling on MTurk that allows researchers to access inexperienced participants without sacrificing data quality. We recommend that in some cases researchers should limit the number of highly experienced workers allowed in their study by excluding these workers or by stratifying sample recruitment based on worker experience levels. We discuss the trade-offs of different sampling practices on MTurk and describe how the above sampling strategies can help researchers harness the vast and largely untapped potential of the Mechanical Turk participant pool.

Klíčová slova:

Attention – Personality – Personality tests – Public and occupational health – Reflection – Research validity – Social sciences – Social systems


Zdroje

1. Buhrmester M, Kwang T, Gosling SD. Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspect Psychol Sci. 2011 Jan;6(1):3–5. doi: 10.1177/1745691610393980 26162106

2. Bentley JW. Challenges with Amazon Mechanical Turk research in accounting. SSRN 2924876. 2018 Mar 30.

3. Chandler J, Shapiro D. Conducting clinical research using crowdsourced convenience samples. Annu Rev Clin Psychol. 2016 Mar 28;12:53–81. doi: 10.1146/annurev-clinpsy-021815-093623 26772208

4. Goodman JK, Paolacci G. Crowdsourcing consumer research. J Consum Res. 2017 Feb 22;44(1):196–210.

5. Bohannon J. Mechanical Turk upends social sciences. Science. 2016; 352 (6291): 1263–1264. doi: 10.1126/science.352.6291.1263 27284175

6. Stewart N, Chandler J, Paolacci G. Crowdsourcing samples in cognitive science. Trends in Cogn Sci. 2017 Oct 1;21(10):736–48.

7. Zhou H, Fishbach A. The pitfall of experimenting on the web: How unattended selective attrition leads to surprising (yet false) research conclusions. J Pers Soc Psychol. 2016 Oct;111(4):493. doi: 10.1037/pspa0000056 27295328

8. DeSoto KA. Under the hood of Mechanical Turk. APS Obs. 2016 Feb 29;29(3).

9. Sorokin A, Forsyth D. Utility data annotation with Amazon Mechanical Turk. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2008 Jun 23 (pp. 1–8). IEEE.

10. Stewart N, Ungemach C, Harris AJ, Bartels DM, Newell BR, Paolacci G, Chandler J. The average laboratory samples a population of 7,300 Amazon Mechanical Turk workers. Judgm Decis Mak. 2015 Sep 1;10(5):479–91.

11. Bohannon J. Social science for pennies. Science. 2011. 334 (6054): 307. doi: 10.1126/science.334.6054.307 22021834

12. Hauser D, Paolacci G, Chandler JJ. Common concerns with MTurk as a participant pool: Evidence and solutions. In Kardes FR, Herr PM, Schwarz N, editors. Handbook of research methods in consumer psychology. New York: Routledge; forthcoming.

13. Chandler J, Paolacci G, Peer E, Mueller P, Ratliff KA. Using nonnaive participants can reduce effect sizes. Psychol Sci. 2015 Jul;26(7):1131–9. doi: 10.1177/0956797615585115 26063440

14. Chandler J, Rosenzweig C, Moss AJ, Robinson J, Litman L. Online panels in social science research: Expanding sampling methods beyond Mechanical Turk Behav Res Methods. 2019 Sep 11:1–7.

15. Stagnaro M, Pennycook G, Rand DG. Performance on the Cognitive Reflection Test is stable across time. SSRN 3115809. 2018.

16. Coppock A. Generalizing from survey experiments conducted on Mechanical Turk: A replication approach. Political Sci Res Methods. 2018 Mar:1–6.

17. Mullinix KJ, Leeper TJ, Druckman JN, Freese J. The generalizability of survey experiments. Journal of Experimental Political Science. 2015;2(2):109–38.

18. Hauser DJ, Schwarz N. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behav Res Methods. 2016 Mar 1;48(1):400–7. doi: 10.3758/s13428-015-0578-z 25761395

19. Shapiro DN, Chandler J, Mueller PA. Using Mechanical Turk to study clinical populations. Clin Psychol Sci. 2013 Apr;1(2):213–20.

20. Crump MJ, McDonnell JV, Gureckis TM. Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PloS One. 2013 Mar 13;8(3):e57410. doi: 10.1371/journal.pone.0057410 23516406

21. Difallah D, Filatova E, Ipeirotis P. Demographics and dynamics of Mechanical Turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining 2018 Feb 2 (pp. 135–143). ACM.

22. Ipeirotis PG. Demographics of Mechanical Turk. CeDER-10–01 working paper, New York University. https://archive.nyu.edu/bitstream/2451/29585/2/CeDER-10-01.pdf?__hstc=214931602.e72c280a7921bf0d7ab734f9822a9c39.1532390400111.1532390400112.1532390400113.1&__hssc=214931602.1.1532390400114&__hsfp=1773666937

23. Litman L, Robinson J, Abberbock T. TurkPrime. com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Res Methods. 2017 Apr 1;49(2):433–42.

24. Fort K, Adda G, Cohen KB. Amazon Mechanical Turk: Gold mine or coal mine? Comput Linguist. 2011 Jun;37(2):413–20.

25. MTurk Crowd. Your first 1000 HITs [Internet]. Message posted by user Jklmnop 2016 Jan 12 to https://www.mturkcrowd.com/threads/your-first-1000-hits.23/

26. Smith MA, Leigh B. Virtual subjects: Using the Internet as an alternative source of subjects and research environment. Behav Res Methods Instrum Comput. 1997 Dec 1;29(4):496–505.

27. Ramsey SR, Thompson KL, McKenzie M, Rosenbaum A. Psychological research in the internet age: The quality of web-based data Comput Human Behav. 2016 May 1;58:354–60.

28. Peer E, Vosgerau J, Acquisti A. Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Res Methods. 2014 Dec 1;46(4):1023–31.

29. Berinsky AJ, Huber GA, Lenz GS. Evaluating online labor markets for experimental research: Amazon. com’s Mechanical Turk. Polit Anal. 2012;20(3):351–68.

30. Goodman JK, Cryder CE, Cheema A. Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. J Behav Decis Mak. 2013 Jul;26(3):213–24.

31. Paolacci G, Chandler J, Ipeirotis PG. Running experiments on amazon mechanical turk. Judgm Decis Mak. 2010 Jun 24;5(5):411–9.

32. Tversky A, Kahneman D. The framing of decisions and the psychology of choice. Science. 1981 Jan 30;211(4481):453–8. doi: 10.1126/science.7455683 7455683

33. Jacowitz KE, Kahneman D. Measures of anchoring in estimation tasks. Pers Soc Psychol Rev. 1995 Nov;21(11):1161–6.

34. Thomson JJ. Killing, letting die, and the trolley problem. The Monist. 1976 Jul 1;59(2):204–17. doi: 10.5840/monist197659224 11662247

35. Hauser M, Cushman F, Young L, Kang‐Xing Jin R, Mikhail J. A dissociation between moral judgments and justifications. Mind Lang. 2007 Feb;22(1):1–21.

36. John OP, Naumann LP, Soto CJ. Paradigm shift to the integrative big five trait taxonomy. In John OP, Robins JR, Pervin LA, editors. Handbook of personality: Theory and research. New York: Guilford Press; 2008. p.114–58.

37. Litman L, Robinson J, Rosenzweig C. The relationship between motivation, monetary compensation, and data quality among US-and India-based workers on Mechanical Turk. Behavior Res Methods. 2015 Jun 1;47(2):519–28.

38. Frederick S. Cognitive reflection and decision making. J Econ Perspect. 2005 Dec;19(4):25–42.

39. American National Election Studies, Stanford University, and University of Michigan. American National Election Study: 2016 Pilot Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2016–3–16. https://doi.org/10.3886/ICPSR36390.v1

40. Moss AJ, Litman L. The TurkPrime Blog [Internet]. After the bot scare: Understanding what’s been happening with data collection on MTurk and how to stop it. 2018, Sept 18. [cited 2019 June 3]. https://blog.turkprime.com/after-the-bot-scare-understanding-whats-been-happening-with-data-collection-on-mturk-and-how-to-stop-it

41. Kennedy, R, Clifford, S, Burleigh, T, Waggoner, P, Jewell, R. How Venezuela’s economic crisis is undermining social science research—about everything. The Washington Post. 2018, Nov 7. https://www.washingtonpost.com/news/monkey-cage/wp/2018/11/07/how-the-venezuelan-economic-crisis-is-undermining-social-science-research-about-everything-not-just-venezuela/?noredirect=on&utm_term=.e9f29ab7f1e6

42. Chandler J., L. Litman & Y. Robinson. Predicting retention in longitudinal studies conducted on Mechanical Turk." Presentation at the American Association for Public Opinion Research Conference, Toronto, May, 2019.

43. Robinson J, Litman L, 2020. Online research on Mechanical Turk and other platforms. In Press. SAGE Publications.


Článek vyšel v časopise

PLOS One


2019 Číslo 12