Predicting the performance of TV series through textual and network analysis: The case of Big Bang Theory

Autoři: Andrea Fronzetti Colladon aff001;  Maurizio Naldi aff002
Působiště autorů: Department of Engineering, University of Perugia, Via G. Durant, 93, 06125 Perugia, Italy aff001;  Department of Civil Engineering and Computer Science, University of Rome Tor Vergata, Rome, Italy aff002;  Department of Law, Economics, Politics and Modern languages, LUMSA University, Rome, Italy aff003
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0225306


TV series represent a growing sector of the entertainment industry. Being able to predict their performance allows a broadcasting network to better focus the high investment needed for their preparation. In this paper, we consider a well known TV series—The Big Bang Theory—to identify factors leading to its success. The factors considered are mostly related to the script, such as the characteristics of dialogues (e.g., length, language complexity, sentiment), while the performance is measured by the reviews submitted by viewers (namely the number of reviews as a measure of popularity and the viewers’ ratings as a measure of appreciation). Through correlation and regression analysis, two sets of predictors are identified respectively for appreciation and popularity. In particular the episode number, the percentage of male viewers, the language complexity and text length emerge as the best predictors for popularity, while again the percentage of male viewers and the language complexity plus the number of we-words and the concentration of dialogues are the best choice for appreciation.

Klíčová slova:

Computational linguistics – Economics – Language – Regression analysis – Semantics – Social networks – Origin of the universe


1. Fu P, Zhu A, Fang Q, Wang X. Modeling Periodic Impulsive Effects on Online TV Series Diffusion. PloS one. 2016;11(9):e0163432. doi: 10.1371/journal.pone.0163432 27669520

2. Bhave A, Kulkarni H, Biramane V, Kosamkar P. Role of different factors in predicting movie success. In: Pervasive Computing (ICPC), 2015 International Conference on. IEEE; 2015. p. 1–4.

3. Delmestri G, Montanari F, Usai A. Reputation and strength of ties in predicting commercial success and artistic merit of independents in the Italian feature film industry. Journal of Management Studies. 2005;42(5):975–1002. doi: 10.1111/j.1467-6486.2005.00529.x

4. Mestyán M, Yasseri T, Kertész J. Early prediction of movie box office success based on Wikipedia activity big data. PloS one. 2013;8(8):e71226. doi: 10.1371/journal.pone.0071226 23990938

5. Krauss J, Nann S, Simon D, Gloor PA, Fischbach K. Predicting Movie Success and Academy Awards through Sentiment and Social Network Analysis. In: ECIS; 2008. p. 2026–2037.

6. Austin BA. The influence of the MPAA’s film-rating system on motion picture attendance: A pilot study. The Journal of Psychology. 1980;106(1):91–99. doi: 10.1080/00223980.1980.9915174

7. Chang BH, Ki EJ. Devising a practical model for predicting theatrical movie success: Focusing on the experience good property. Journal of Media Economics. 2005;18(4):247–269. doi: 10.1207/s15327736me1804_2

8. Jain V. Prediction of movie success using sentiment analysis of tweets. The International Journal of Soft Computing and Software Engineering. 2013;3(3):308–313.

9. Sadikov E, Parameswaran AG, Venetis P. Blogs as Predictors of Movie Success. In: ICWSM; 2009.

10. Sharda R, Delen D. Predicting box-office success of motion pictures with neural networks. Expert Systems with Applications. 2006;30(2):243–254. doi: 10.1016/j.eswa.2005.07.018

11. Kennedy RE. Strategy fads and competitive convergence: An empirical test for herd behavior in prime-time television programming. The Journal of Industrial Economics. 2002;50(1):57–84. doi: 10.1111/1467-6451.00168

12. Barroso A, Giarratana MS, Reis S, Sorenson O. Crowding, satiation, and saturation: The days of television series’ lives. Strategic Management Journal. 2016;37(3):565–585. doi: 10.1002/smj.2345

13. Khessina OM, Reis S. The limits of reflected glory: The beneficial and harmful effects of product name similarity in the US network TV program industry, 1944–2003. Organization Science. 2016;27(2):411–427. doi: 10.1287/orsc.2015.1036

14. Wei-Skillern J, Marciano S. Primer on the US television industry. Harvard Business School Background Note 308-128, Boston; 2008.

15. Eliashberg J, Shugan SM. Film critics: Influencers or predictors? Journal of marketing. 1997;61(2):68–78. doi: 10.2307/1251831

16. Hur M, Kang P, Cho S. Box-office forecasting based on sentiments of movie reviews and Independent subspace method. Information Sciences. 2016;372:608–624. doi: 10.1016/j.ins.2016.08.027

17. Silverman BW. Density estimation for statistics and data analysis. Routledge; 2018.

18. Abrahamsson H, Nordmark M. Program Popularity and Viewer Behaviour in a Large TV-on-demand System. In: Proceedings of the 2012 Internet Measurement Conference. IMC’12. New York, NY, USA: ACM; 2012. p. 199–210. Available from:

19. Fronzetti Colladon A, Vagaggini F. Robustness and stability of enterprise intranet social networks: The impact of moderators. Information Processing & Management. 2017;53(6):1287–1298. doi: 10.1016/j.ipm.2017.07.001

20. Antonacci G, Fronzetti Colladon A, Stefanini A, Gloor PA. It is Rotating Leaders Who Build the Swarm: Social Network Determinants of Growth for Healthcare Virtual Communities of Practice. Journal of Knowledge Management. 2017;21(5):1218–1239.

21. Chen X, Meurers D. Characterizing Text Difficulty with Word Frequencies. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications; 2016. p. 84–94.

22. Baixeries J, Elvevåg B, Ferrer-i Cancho R. The evolution of the exponent of Zipf’s law in language ontogeny. PloS one. 2013;8(3):e53227. doi: 10.1371/journal.pone.0053227 23516390

23. Zipf GK. Human behavior and the principle of least effort: An introduction to human ecology. Ravenio Books; 2016.

24. Piantadosi ST. Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic bulletin & review. 2014;21(5):1112–1130. doi: 10.3758/s13423-014-0585-6

25. Naldi M. Approximation of the truncated Zeta distribution and Zipf’s law. arXiv preprint series, arXiv:151101480. 2015.

26. Gloor PA. Sociometrics and Human Relationships: Analyzing Social Networks to Manage Brands, Predict Trends, and Improve Organizational Performance. London, UK: Emerald Publishing Limited; 2017.

27. Gloor PA, Zhao Y. Tecflow-a temporal communication flow visualizer for social networks analysis. In: ACM CSCW Workshop on Social Networks. vol. 6; 2004.

28. Amancio DR, Comin CH, Casanova D, Travieso G, Bruno OM, Rodrigues FA, et al. A systematic comparison of supervised classifiers. PloS one. 2014;9(4):e94137. doi: 10.1371/journal.pone.0094137 24763312

29. Gloor PA, Krauss J, Nann S, Fischbach K, Schoder D. Web Science 2.0: Identifying Trends through Semantic Social Network Analysis. In: 2009 International Conference on Computational Science and Engineering. Vancouver, Canada: IEEE; 2009. p. 215–222. Available from:

30. Brönnimann L. Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians; 2013.

31. Brönnimann L. Analyse der Verbreitung von Innovationen in sozialen Netzwerken [M.Sc. Thesis]. University of Applied Sciences Northwestern Switzerland; 2014. Available from:

32. Pennebaker JW. The Secret Life of Pronouns: What our Words Say about us. New York, NY: Bloomsbury Press; 2011.

33. Pennebaker JW, Chung CK, Frazee J, Lavergne GM, Beaver DI. When small words foretell academic success: The case of college admissions essays. PloS one. 2014;9(12):e115844. doi: 10.1371/journal.pone.0115844 25551217

34. Chung CK, Pennebaker JW. Using computerized text analysis to track social processes. Handbook of language and social psychology New York: Oxford. 2013; p. 12.

35. Scholand A, Tausczik Y, Pennebaker J. Linguistic analysis of workplace computer-mediated communication. Proceedings of Computer Supported Cooperative Work 2010. 2010.

36. Hancock JT, Beaver DI, Chung CK, Frazee J, Pennebaker JW, Graesser A, et al. Social language processing: A framework for analyzing the communication of terrorists and authoritarian regimes. Behavioral Sciences of Terrorism and Political Aggression. 2010;2(2):108–132. doi: 10.1080/19434471003597415

37. Rhoades SA. The Herfindahl-Hirschman Index. Fed Res Bull. 1993;79:188.

38. Naldi M. Concentration indices and Zipf’s law. Economics Letters. 2003;78(3):329–334. doi: 10.1016/S0165-1765(02)00251-3

39. Khan HH, Ahmad RB, Gee CS. Market structure, financial dependence and industrial growth: Evidence from the banking industry in emerging Asian economies. PloS one. 2016;11(8):e0160452. doi: 10.1371/journal.pone.0160452 27490847

40. Sheskin DJ. Handbook of parametric and nonparametric statistical procedures. crc Press; 2003.

41. Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge university press; 2006.

42. Brown MB, Forsythe AB. Robust tests for the equality of variances. Journal of the American Statistical Association. 1974;69(346):364–367. doi: 10.1080/01621459.1974.10482955

43. Cohen J, Cohen P, West SG, Aiken LS. Applied multiple regression/correlation analysis for the behavioral sciences. Routledge; 2013.

Článek vyšel v časopise


2019 Číslo 11