Election forensics: Using machine learning and synthetic data for possible election anomaly detection

Autoři: Mali Zhang aff001;  R. Michael Alvarez aff001;  Ines Levin aff002
Působiště autorů: Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, United States of America aff001;  Department of Political Science, University of California, Irvine, CA, United States of America aff002
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0223950


Assuring election integrity is essential for the legitimacy of elected representative democratic government. Until recently, other than in-person election observation, there have been few quantitative methods for determining the integrity of a democratic election. Here we present a machine learning methodology for identifying polling places at risk of election fraud and estimating the extent of potential electoral manipulation, using synthetic training data. We apply this methodology to mesa-level data from Argentina’s 2015 national elections.

Klíčová slova:

Argentina – Elections – Forensics – Literacy – Machine learning – Publication ethics – Research integrity – Supervised machine learning


1. Bjornlund EC. Beyond free and fair: Monitoring elections and building democracy. Woodrow Wilson Center Press; 2004.

2. Hyde SD. The Pseudo-Democrat’s Dilemma: Why Election Observation Became an International Norm. Cornell University Press; 2011.

3. Hyde SD. The observer effect in international politics: Evidence from a natural experiment. World Politics. 2007;60(1):37–63. doi: 10.1353/wp.0.0001

4. Chandola V, Banerjee A, Kumar V. Anomaly Detection: A Survey. ACM Computing Surveys. 2009;41(3). doi: 10.1145/1541880.1541882

5. Myakgov M, Ordeshook PC, Shaikin D. The Forensics of Election Fraud; 2009.

6. Beber B, Scacco A. What the numbers say: A digit-based test for election fraud. Political analysis. 2012;20(2):211–234. doi: 10.1093/pan/mps003

7. Mebane W. Election forensics: The second-digit Benford’s law test and recent American presidential elections. Election Fraud: Detecting and Deterring Electoral Manipulation, Alvarez R M, Hall T E, Hyde S D, Eds. 2008; p. 162–181.

8. Mebane WR Jr. Comment on “Benford’s Law and the detection of election fraud”. Political Analysis. 2011;19(3):269–272. doi: 10.1093/pan/mpr024

9. Deckert J, Myagkov M, Ordeshook PC. Benford’s Law and the detection of election fraud. Political Analysis. 2011;19(3):245–268. doi: 10.1093/pan/mpr014

10. Levin I, Cohen GA, Ordeshook PC, Alvarez RM. Detecting Voter Fraud in an Electronic Voting Context: An Analysis of the Unlimited Reelection Vote in Venezuela. USENIX Proceedings of the 2009 Electronic Voting Technology Workshop/Workshop on Trustworthy Elections. 2009;.

11. Alvarez RM, Katz JN. The Case of the 2002 General Election. Election Fraud: Detecting and Deterring Electoral Manipulation, Alvarez R M, Hall T E, Hyde S D, Eds. 2008; p. 149–162.

12. Klimek P, Yegorov Y, Hanel R, Thurner S. Statistical detection of systematic election irregularities. Proceedings of the National Academy of Sciences. 2012;109:16469–16473. https://doi.org/10.1073/pnas.1210722109.

13. Kobak D, Shpilkin S, Pshenichnikov MS. Integer Percentages as Electoral Falsification Fingerprints. The Annals of Applied Statistics. 2016;10(1):54–73. doi: 10.1214/16-AOAS904

14. Kobak D, Pshenichnikov SSMS. Statistical fingerprints of electoral fraud? Significance. 2016;13:20–23. doi: 10.1111/j.1740-9713.2016.00936.x

15. Rozenas A. Detecting Election Fraud from Irregularities in Vote-Share Distributions. Political Analysis. 2017;25(1):41–56. doi: 10.1017/pan.2016.9

16. Klimek P, Jimenez R, Hidalgo M, Hinteregger A, Thurner S. Forensic analysis of Turkish elections in 2017-2018. PLOS ONE. 2018; https://doi.org/10.1371/journal.pone.0204975.

17. Kobak D, Shpilkin S, Pshenichnikov MS. Putin’s peaks: Russian election data revisited. Significance. 2018;15:8–9. doi: 10.1111/j.1740-9713.2018.01141.x

18. Levin I, Pomares J, Alvarez RM. Using Machine Learning Algorithms to Detect Election Fraud. Computational Social Science: Discovery and Prediction/ed. 2016; p. 266–294. doi: 10.1017/CBO9781316257340.012

19. Cantú F, Saiegh SM. Fraudulent democracy? An analysis of Argentina’s Infamous Decade using supervised machine learning. Political Analysis. 2011;19(4):409–433. doi: 10.1093/pan/mpr033

20. Jiménez R, Hidalgo M. Forensic analysis of Venezuelan elections during the Chávez presidency. PloS one. 2014;9(6):e100884. doi: 10.1371/journal.pone.0100884 24971462

21. Montgomery JM, Olivella S, Potter JD, Crisp BF. An Informed Forensics Approach to Detecting Vote Irregularities. Political Analysis. 2015;23(4):488–505. doi: 10.1093/pan/mpv023

22. Electoral DN. Resultados Electorales 2015; 2018. Available from: http://datos.gob.ar/dataset/otros-resultados-electorales-2015.

23. Zhang M and Alvarez RM, Levin I. Replication Materials for: Election Forensics: Using Machine Learning and Synthetic Data for Possible Election Anomaly Detection Harvard Dataverse 2019 https://doi.org/10.7910/DVN/YZRJWD.

24. Casas A, Díaz G, Trindade A. Who Monitors the Monitor? Effect of Party Observers on Electoral Outcomes. Journal of Public Economics. 2017;145:136–149. doi: 10.1016/j.jpubeco.2016.11.015

25. Bronstein H. Court sides with Argentine ruling party in Tucuman vote dispute; 2015. Available from: https://www.reuters.com/article/us-argentina-election-tucuman/court-sides-with-argentine-ruling-party-in-tucuman-vote-dispute\-idUSKCN0RL21420150921.

26. Oliveros V. Perceptions of Ballot Integrity and Clientelism. Campaigns and Voters in Developing Democracies: Argentina in Comparative Perspective. 2019; p. 213–238.

27. Breiman L. Random Forests. Machine Learning. 2001;45:5–32.

Článek vyšel v časopise


2019 Číslo 10