A network-centric approach for estimating trust between open source software developers


Autoři: Hitesh Sapkota aff001;  Pradeep K. Murukannaiah aff002;  Yi Wang aff001
Působiště autorů: Software Engineering, Rochester Institute of Technology, Rochester, NY, United States of America aff001;  Intelligent Systems-EWI, Delft University of Technology, Delft, The Netherlands aff002
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: 10.1371/journal.pone.0226281

Souhrn

Trust between developers influences the success of open source software (OSS) projects. Although existing research recognizes the importance of trust, there is a lack of an effective and scalable computational method to measure trust in an OSS community. Consequently, OSS project members must rely on subjective inferences based on fragile and incomplete information for trust-related decision making. We propose an automated approach to assist a developer in identifying the trustworthiness of another developer. Our two-fold approach, first, computes direct trust between developer pairs who have interacted previously by analyzing their interactions via natural language processing. Second, we infer indirect trust between developers who have not interacted previously by constructing a community-wide developer network and propagating trust in the network. A large-scale evaluation of our approach on a GitHub dataset consisting of 24,315 developers shows that contributions from trusted developers are more likely to be accepted to a project compared to contributions from developers who are distrusted or lacking trust from project members. Further, we develop a pull request classifier that exploits trust metrics to effectively predict the likelihood of a pull request being accepted to a project, demonstrating the practical utility of our approach.

Klíčová slova:

Decision making – Network analysis – Open source software – Social networks – Software engineering – Support vector machines – Word embedding


Zdroje

1. Korsgaard MA, Schweiger DM, Sapienza HJ. Building commitment, attachment, and trust in strategic decision-making teams: The role of procedural justice. Academy of Management journal. 1995;38(1):60–84. doi: 10.5465/256728

2. Al-Ani B, Bietz MJ, Wang Y, Trainer E, Koehne B, Marczak S, et al. Globally Distributed System Developers: Their Trust Expectations and Processes. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work. CSCW’13. San Antonio, TX: ACM; 2013. p. 563–574.

3. Dabbish L, Stuart C, Tsay J, Herbsleb J. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. CSCW’12. Seattle; 2012. p. 1277–1286.

4. Steinmacher I, Chaves AP, Conte TU, Gerosa MA. Preliminary Empirical Identification of Barriers Faced by Newcomers to Open Source Software Projects. In: 2014 Brazilian Symposium on Software Engineering; 2014. p. 51–60.

5. Fang Y, Neufeld D. Understanding sustained participation in open source software projects. Journal of Management Information Systems. 2009;25(4):9–50. doi: 10.2753/MIS0742-1222250401

6. Crowston K, Wei K, Howison J, Wiggins A. Free/Libre Open-source Software Development: What We Know and What We Do Not Know. ACM Computing Surveys. 2008;44(2):7:1–7:35.

7. Jarvenpaa SL, Knoll K, Leidner DE. Is anybody out there? Antecedents of trust in global virtual teams. Journal of Management Information Systems. 1998;14(4):29–64. doi: 10.1080/07421222.1998.11518185

8. Stewart K, Gosain S. An Exploratory Study of Ideology and Trust in Open Source Development Groups. In: Proceedings of the International Conference on Information Systems; 2001. p. 507–512.

9. Wang Y, Wang Z, Redmiles D. The Co-Evolution of Trust and Coordination in Global Software Development Teams: An Extensible Evolutionary Game Theory Model. In: Proceedings of the 52nd Hawaii Conference on System Science. HICSS’19; 2019. p. 5767–5776.

10. Jarvenpaa SL, Shaw TR, Staples DS. Toward contextualized theories of trust: The role of trust in global virtual teams. Journal of Information Systems Research. 2004;15(3):250–267. doi: 10.1287/isre.1040.0028

11. Trainer EH, Redmiles DF. Bridging the gap between awareness and trust in globally distributed software teams. Journal of Systems and Software. 2018;144:328–341. doi: 10.1016/j.jss.2018.06.028

12. Steinmacher I, Chaves AP, Gerosa MA. Awareness support in distributed software development: A systematic review and mapping of the literature. Computer Supported Cooperative Work (CSCW). 2013;22(2-3):113–158. doi: 10.1007/s10606-012-9164-4

13. Calefato F, Lanubile F. SocialCDE: A Social Awareness Tool for Global Software Teams. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2013. Saint Petersburg; 2013. p. 587–590.

14. Tsay J, Dabbish L, Herbsleb J. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In: Proceedings of the 36th International Conference on Software Engineering. ICSE 2014. Hyderabad; 2014. p. 356–366.

15. Marlow J, Dabbish L, Herbsleb J. Impression Formation in Online Peer Production: Activity Traces and Personal Profiles in Github. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work. CSCW’13. San Antonio, TX; 2013. p. 117–128.

16. Jøsang A, Hayward R, Pope S. Trust Network Analysis with Subjective Logic. In: Proceedings of the 29th Australasian Computer Science Conference - Volume 48. ACSC’06; 2006. p. 85–94.

17. Sinha VS, Mani S, Sinha S. Entering the Circle of Trust: Developer Initiation As Committers in Open-source Projects. In: Proceedings of the 8th Working Conference on Mining Software Repositories. MSR’11. Waikiki, Honolulu; 2011. p. 133–142.

18. Gousios G, Storey MA, Bacchelli A. Work Practices and Challenges in Pull-based Development: The Contributor’s Perspective. In: Proceedings of the 38th International Conference on Software Engineering. ICSE’16. Austin, Texas; 2016. p. 285–296.

19. Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2015. Bergamo: ACM; 2015. p. 805–816.

20. Sapkota H, Murukannaiah PK, Wang Y. Dataset and Software for Estimating Trust between Open Source Software Developers; 2019. Available from: https://doi.org/10.5281/zenodo.3522461.

21. Golbeck JA. Computing and Applying Trust in Web-based Social Networks. University of Maryland. College Park, MD; 2005.

22. Jøsang A. A Logic for Uncertain Probabilities. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 2001;9(3):279–311. doi: 10.1142/S0218488501000831

23. Shafer G. A Mathematical Theory of Evidence. Princeton: Princeton University Press; 1976.

24. Begel A, Khoo YP, Zimmermann T. Codebook: Discovering and Exploiting Relationships in Software Repositories. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering—Volume 1. ICSE’10. Cape Town; 2010. p. 125–134.

25. Hallgren KA. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutorials in quantitative methods for psychology. 2012;81:23–34. doi: 10.20982/tqmp.08.1.p023

26. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013;.

27. Islam MR, Zibran MF. Leveraging Automated Sentiment Analysis in Software Engineering. In: Proceedings of the 14th International Conference on Mining Software Repositories. MSR’17. Buenos Aires; 2017. p. 203–214.

28. GitHub API. GraphQL API v4 Reference: CommentAuthorAssociation; Accessed: July 2019. https://developer.github.com/v4/enum/commentauthorassociation/.

29. XGBoost Developers. Gradient Boosting (XGBoost); Accessed: July 2019. https://xgboost.readthedocs.io/en/latest/index.html.

30. Scikit-Learn. Ensemble methods: AdaBoost; Accessed: July 2019. https://scikit-learn.org/stable/modules/ensemble.html#adaboost.

31. Scikit-Learn. Ensemble methods: Bagging; Accessed: July 2019. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html.

32. Scikit-Learn. Generalized linear models: Lasso; Accessed: July 2019. https://scikit-learn.org/stable/modules/linear_model.html#lasso.

33. Scikit-Learn. Support Vector Machines (SVM); Accessed: July 2019. https://scikit-learn.org/stable/modules/svm.html.

34. Liu G, Wang Y, Orgun MA. Optimal Social Trust Path Selection in Complex Social Networks. In: Proceedings of the Twenty-Fourth Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence. Atlanta; 2010. p. 1391–1398.

35. Lu J, Liu G, Zheng B, Zhao Y, Zheng K. Social context-aware trust paths finding for trustworthy service provider selection in social media. Multimedia Tools and Applications. 2019;.

36. Ruan Y, Zhang P, Alfantoukh L, Durresi A. Measurement Theory-Based Trust Management Framework for Online Social Communities. ACM Transactions Internet Technology. 2017;17(2):16:1–16:24. doi: 10.1145/3015771

37. GitHub. GitHub Terms of Service: API Terms; Accessed: October 2019. https://help.github.com/en/github/site-policy/github-terms-of-service#h-api-terms.

38. Hollander M, Wolfe DA. Nonparametric Statistical Methods. New York: Wiley; 1999.

39. Dunn OJ. Multiple comparisons using rank sums. Technometrics. 1964;6(3):241–252. doi: 10.1080/00401706.1964.10490181

40. Holm S. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics. 1979;6(2):65–70.

41. Cliff N. Ordinal Methods for Behavioral Data Analysis. Psychology Press; 2014.

42. Scikit-Learn. Decision Tree Classifier; Accessed: July 2019. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html.

43. Gershman SJ, Tenenbaum JB. Phrase similarity in humans and machines. In: Proceedings of the 37th Annual Conference of the Cognitive Science Society; 2015. p. 776–781.

44. Yu L, Hermann K, Blunsom P, Pulman S. Deep Learning for Answer Sentence Selection. In: Proceedings of the Deep Learning and Representation Learning Workshop: NIPS-2014; 2014. p. 1–9.

45. Laat PB. How Can Contributors to Open-source Communities Be Trusted? On the Assumption, Inference, and Substitution of Trust. Ethics and Information Technology. 2010;12(4):327–341. doi: 10.1007/s10676-010-9230-x

46. Calefato F, Lanubile F, Novielli N. A Preliminary Analysis on the Effects of Propensity to Trust in Distributed Software Development. In: 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE); 2017. p. 56–60.

47. Acedo-Carmona C, Gomila A. Personal Trust Increases Cooperation beyond General Trust. PLOS ONE. 2014;9(8):e105559. doi: 10.1371/journal.pone.0105559 25144539

48. Lane MS, van der Vyver G, Basnet P, Howard S. Interpretative Insights into Interpersonal Trust and Effectiveness of Virtual Communities of Open Source Software (OSS) Developers. In: Proceedings of the 15th Australasian Conference on Information Systems (ACIS 2004). University of Tasmania; 2004. p. 1–11.

49. Sirkkala P, Hammouda I, Aaltonen T. From Proprietary to Open Source: Building a Network of Trust. In: Proceedings of Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010); 2010. p. 26–30.

50. Wang Y, Redmiles D. Cheap talk, cooperation, and trust in global software engineering. Empirical Software Engineering. 2016;21(6):2233–2267. doi: 10.1007/s10664-015-9407-3

51. Wang Y, Redmiles D. The Diffusion of Trust and Cooperation in Teams with Individuals’ Variations on Baseline Trust. In: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. CSCW’16. San Francisco, California; 2016. p. 303–318.

52. Zolin R, Hinds PJ, Fruchter R, Levitt RE. Interpersonal trust in cross-functional, geographically distributed work: A longitudinal study. Information and Organization. 2004;14(1):1–26. doi: 10.1016/j.infoandorg.2003.09.002

53. Stewart KJ, Gosain S. The Impact of Ideology on Effectiveness in Open Source Software Development Teams. MIS Quarterly. 2006;30(2):291–314. doi: 10.2307/25148732

54. Gallardo-Valencia RE, Tantikul P, Sim SE. Searching for Reputable Source Code on the Web. In: Proceedings of the 16th ACM International Conference on Supporting Group Work. GROUP’10. Sanibel Island; 2010. p. 183–186.

55. Orsila H, Geldenhuys J, Ruokonen A, Hammouda I. Trust issues in open source software development. In: Proceedings of the Warm Up Workshop for ACM/IEEE ICSE; 2009. p. 9–12.

56. Gysin FS, Kuhn A. A Trustability Metric for Code Search Based on Developer Karma. In: Proceedings of 2010 ICSE Workshop on Search-driven Development: Users, Infrastructure, Tools and Evaluation. SUITE’10. Cape Town; 2010. p. 41–44.

57. Calefato F, Lanubile F. Establishing Personal Trust-based Connections in Distributed Teams. Internet Technology Letters. 2018;1(4):e6. doi: 10.1002/itl2.6

58. Jøsang A, Ismail R, Boyd C. A Survey of Trust and Reputation Systems for Online Service Provision. Decision Support Systems. 2007;43(2):618–644. doi: 10.1016/j.dss.2005.05.019

59. Kafali Ö, Yolum P. Action-Based Environment Modeling for Maintaining Trust. In: Trust in Agent Societies. vol. 5396; 2008. p. 81–98.

60. Sherchan W, Nepal S, Paris C. A Survey of Trust in Social Networks. ACM Computing Surveys. 2013;45(4):47:1–47:33. doi: 10.1145/2501654.2501661

61. Artz D, Gil Y. A Survey of Trust in Computer Science and the Semantic Web. Web Semantics: Science, Services and Agents on the World Wide Web. 2007;5(2):58–71. doi: 10.1016/j.websem.2007.03.002

62. Hamdi S, Gancarski AL, Bouzeghoub A, Yahia SB. IRIS: A Novel Method of Direct Trust Computation for Generating Trusted Social Networks. In: 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications; 2012. p. 616–623.

63. Guha R, Kumar R, Raghavan P, Tomkins A. Propagation of Trust and Distrust. In: Proceedings of the 13th International Conference on World Wide Web. WWW’04. New York; 2004. p. 403–412.

64. Zhao T, Li C, Li M, Ding Q, Li L. Social Recommendation Incorporating Topic Mining and Social Trust Analysis. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. CIKM’13. San Francisco; 2013. p. 1643–1648.

65. Kafali Ö, Yolum P. Adapting Reinforcement Learning for Trust: Effective Modeling in Dynamic Environments. In: 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. vol. 1; 2009. p. 383–386.

66. Richters O, Peixoto TP. Trust Transitivity in Social Networks. PLOS ONE. 2011;6(4):e18384. doi: 10.1371/journal.pone.0018384 21483683

67. Liu G, Wang Y, Orgun MA. Trust Transitivity in Complex Social Networks. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. AAAI’11. San Francisco; 2011. p. 1222–1229.

68. Zhang P, Durresi A. Trust management framework for social networks. 2012 IEEE International Conference on Communications (ICC). 2012; p. 10420–1047.

69. Sun YL, Yu W, Han Z, Liu KJR. Information Theoretic Framework of Trust Modeling and Evaluation for Ad Hoc Networks. IEEE Journal on Selected Areas in Communications. 2006;24(2):305–317. doi: 10.1109/JSAC.2005.861389

70. Victor P, Cornelis C, Cock MD. Trust Networks for Recommender Systems. 1st ed. Atlantis Publishing Corporation; 2011.

71. Golbeck J, Hendler J. Inferring Binary Trust Relationships in Web-based Social Networks. ACM Transactions Internet Technology. 2006;6(4):497–529. doi: 10.1145/1183463.1183470

72. Lee SYT, Kim HW, Gupta S. Measuring open source software success. Omega. 2009;37(2):426–438. doi: 10.1016/j.omega.2007.05.005


Článek vyšel v časopise

PLOS One


2019 Číslo 12