Error rates of human reviewers during abstract screening in systematic reviews

English version

Autoři: Zhen Wang ^aff001; Tarek Nayfeh ^aff002; Jennifer Tetzlaff ^aff003; Peter O’Blenis ^aff003; Mohammad Hassan Murad ^aff001
Působiště autorů: Evidence-based Practice Center, Mayo Clinic, Rochester, Minnesota, United States of America ^aff001; Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery Mayo Clinic, Rochester, Minnesota, United States of America ^aff002; Evidence Partners, Ottawa, Ontario, Canada ^aff003
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pone.0227742

Souhrn

Background

Automated approaches to improve the efficiency of systematic reviews are greatly needed. When testing any of these approaches, the criterion standard of comparison (gold standard) is usually human reviewers. Yet, human reviewers make errors in inclusion and exclusion of references.

Objectives

To determine citation false inclusion and false exclusion rates during abstract screening by pairs of independent reviewers. These rates can help in designing, testing and implementing automated approaches.

Methods

We identified all systematic reviews conducted between 2010 and 2017 by an evidence-based practice center in the United States. Eligible reviews had to follow standard systematic review procedures with dual independent screening of abstracts and full texts, in which citation inclusion by one reviewer prompted automatic inclusion through the next level of screening. Disagreements between reviewers during full text screening were reconciled via consensus or arbitration by a third reviewer. A false inclusion or exclusion was defined as a decision made by a single reviewer that was inconsistent with the final included list of studies.

Results

We analyzed a total of 139,467 citations that underwent 329,332 inclusion and exclusion decisions from 86 unique reviewers. The final systematic reviews included 5.48% of the potential references identified through bibliographic database search (95% confidence interval (CI): 2.38% to 8.58%). After abstract screening, the total error rate (false inclusion and false exclusion) was 10.76% (95% CI: 7.43% to 14.09%).

Conclusions

This study suggests important false inclusion and exclusion rates by human reviewers. When deciding the validity of a future automated study selection algorithm, it is important to keep in mind that the gold standard is not perfect and that achieving error rates similar to humans may be adequate and can save resources and time.

Klíčová slova:

Automation – Cardiovascular medicine – Citation analysis – Database searching – Distillation – Health screening – Mental health and psychiatry – Systematic reviews

Zdroje

1. Cochrane AL. 1931–1971: a critical review, with particular reference to the medical profession. Medicines for the year. 2000;1979:1.

2. Ioannidis JP. The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-analyses. Milbank Q. 2016;94(3):485–514. Epub 2016/09/14. doi: 10.1111/1468-0009.12210 27620683

3. Page MJ, Altman DG, McKenzie JE, Shamseer L, Ahmadzai N, Wolfe D, et al. Flaws in the application and interpretation of statistical analyses in systematic reviews of therapeutic interventions were common: a cross-sectional analysis. J Clin Epidemiol. 2018;95:7–18. Epub 2017/12/06. doi: 10.1016/j.jclinepi.2017.11.022 29203419.

4. Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ. 2017;356:j448. Epub 2017/02/19. doi: 10.1136/bmj.j448 28213479

5. Higgins JP, Green S. Cochrane handbook for systematic reviews of interventions: John Wiley & Sons; 2011.

6. Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux P, Prasad K, et al. How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. Jama. 2014;312(2):171–9. doi: 10.1001/jama.2014.5559 25005654

7. Wang Z, Asi N, Elraiyah TA, Abu Dabrh AM, Undavalli C, Glasziou P, et al. Dual computer monitors to increase efficiency of conducting systematic reviews. J Clin Epidemiol. 2014;67(12):1353–7. Epub 2014/08/03. doi: 10.1016/j.jclinepi.2014.06.011 25085736.

8. Wang Z, Noor A, Elraiyah T, Murad M, editors. Dual monitors to increase efficiency of conducting systematic reviews. 21st Cochrane Colloquium; 2013.

9. Allen IE, Olkin I. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA. 1999;282(7):634–5. Epub 1999/10/12. doi: 10.1001/jama.282.7.634 10517715.

10. Khangura S, Konnyu K, Cushman R, Grimshaw J, Moher D. Evidence summaries: the evolution of a rapid review approach. Syst Rev. 2012;1:10. Epub 2012/05/17. doi: 10.1186/2046-4053-1-10 22587960

11. Hailey D, Corabian P, Harstall C, Schneider W. The use and impact of rapid health technology assessments. International journal of technology assessment in health care. 2000;16(2):651–6. doi: 10.1017/s0266462300101205 10932429

12. Patnode CD, Eder ML, Walsh ES, Viswanathan M, Lin JS. The use of rapid review methods for the US Preventive Services Task Force. American journal of preventive medicine. 2018;54(1):S19–S25.

13. Ganann R, Ciliska D, Thomas H. Expediting systematic reviews: methods and implications of rapid reviews. Implement Sci. 2010;5:56. Epub 2010/07/21. doi: 10.1186/1748-5908-5-56 20642853

14. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews. 2015;4(1):5.

15. Li D, Wang Z, Wang L, Sohn S, Shen F, Murad MH, et al. A Text-Mining Framework for Supporting Systematic Reviews. Am J Inf Manag. 2016;1(1):1–9. Epub 2017/10/27. 29071308

16. Li D, Wang Z, Shen F, Murad MH, Liu H, editors. Towards a multi-level framework for supporting systematic review—A pilot study. 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2014: IEEE.

17. Alsawas M, Alahdab F, Asi N, Li DC, Wang Z, Murad MH. Natural language processing: use in EBM and a guide for appraisal. BMJ Evidence-Based Medicine. 2016;21(4):136–8.

18. Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association. 2006;13(2):206–19. doi: 10.1197/jamia.M1929 16357352

19. Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7(1):77. Epub 2018/05/21. doi: 10.1186/s13643-018-0740-7 29778096

20. O’Connor AM, Tsafnat G, Gilbert SB, Thayer KA, Shemilt I, Thomas J, et al. Still moving toward automation of the systematic review process: a summary of discussions at the third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR). Syst Rev. 2019;8(1):57. Epub 2019/02/23. doi: 10.1186/s13643-019-0975-y 30786933

21. Bannach-Brown A, Przybyla P, Thomas J, Rice ASC, Ananiadou S, Liao J, et al. Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Syst Rev. 2019;8(1):23. Epub 2019/01/17. doi: 10.1186/s13643-019-0942-7 30646959

22. SR Toolbox 2019 [cited 2019 August 6]. http://systematicreviewtools.com/index.php.

23. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53. doi: 10.1126/science.aax2342 31649194

24. O’Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Systematic Reviews. 2019;8(1):143. doi: 10.1186/s13643-019-1062-0 31215463

25. Smith V, Devane D, Begley CM, Clarke M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med Res Methodol. 2011;11(1):15. Epub 2011/02/05. doi: 10.1186/1471-2288-11-15 21291558

Error rates of human reviewers during abstract screening in systematic reviews

Souhrn

Background

Objectives

Methods

Results

Conclusions

Klíčová slova:

Zdroje

PLOS One

Svět praktické medicíny 2/2025 (znalostní test z časopisu)

Eozinofilní zánět a remodelace

Svět praktické medicíny 1/2025 (znalostní test z časopisu)

Hypertrofická kardiomyopatie: Moderní přístupy v diagnostice a léčbě

Vliv funkčního chrupu na paměť a učení