Refined distributed emotion vector representation for social media sentiment analysis


Autoři: Yung-Chun Chang aff001;  Wen-Chao Yeh aff001;  Yan-Chun Hsing aff001;  Chen-Ann Wang aff003
Působiště autorů: Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan aff001;  Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan aff002;  Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu, Taiwan aff003
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0223317

Souhrn

As user-generated content increasingly proliferates through social networking sites, our lives are bombarded with ever more information, which has in turn has inspired the rapid evolution of new technologies and tools to process these vast amounts of data. Semantic and sentiment analysis of these social multimedia have become key research topics in many areas in society, e.g., in shopping malls to help policymakers predict market trends and discover potential customers. In this light, this study proposes a novel method to analyze the emotional aspects of Chinese vocabulary and then to assess the mass comments of the movie reviews. The experiment results show that our method 1. can improve the machine learning model by providing more refined emotional information to enhance the effectiveness of movie recommendation systems, and 2. performs significantly better than the other commonly used methods of emotional analysis.

Klíčová slova:

Emotions – Linear regression analysis – Machine learning – Semantics – Social media – Support vector machines – Word embedding – Lexicons


Zdroje

1. Liu B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1–167.

2. Kalaivani P., & Shunmuganathan K. L. (2013). Sentiment classification of movie reviews by supervised machine learning approaches. Indian Journal of Computer Science and Engineering, 4(4), 285–292.

3. Barrett L. F. (1998). Discrete emotions or dimensions? The role of valence focus and arousal focus. Cognition & Emotion, 12(4), 579–599.

4. Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79–86). Association for Computational Linguistics.

5. Jung-Shin Ho, Rebecca Lin, Sherry Lee From CommonWealth Magazine (vol. 562) 2014-12-12, https://english.cw.com.tw/article/article.action?id=300

6. Recurrent convolutional neural networks for text classification. In TTurney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417–424). Association for Computational Linguistics.

7. Joachims, T. (1998, April). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning (pp. 137–142). Springer, Berlin, Heidelberg.

8. Salton G., & Buckley C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513–523.

9. Martineau, J. C., & Finin, T. (2009, March). Delta tfidf: An improved feature space for sentiment analysis. In Third international AAAI conference on weblogs and social media.

10. Gray E. K., & Watson D. (2007). Assessing positive and negative affect via self-report. Handbook of emotion elicitation and assessment, 171–183.

11. Plutchik R. (1980). A general psychoevolutionary theory of emotion. In Theories of emotion (pp. 3–33). Academic press.

12. Osgood C. E. (1952). The nature and measurement of meaning. Psychological bulletin, 49(3), 197–237. doi: 10.1037/h0055737 14930159

13. Russell J. A. (1980). A circumplex model of affect. Journal of personality and social psychology, 39(6), 1161–1178.

14. Yu, L. C., Wang, J., Lai, K. R., & Zhang, X. J. (2015, July). Predicting valence-arousal ratings of words using a weighted graph method. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 788–793).

15. Zhao, L., & Li, C. (2009, November). Ontology based opinion mining for movie reviews. In International Conference on Knowledge Science, Engineering and Management (pp. 204–214). Springer, Berlin, Heidelberg.

16. Wang, H. Y., & Ma, W. Y. (2016, November). CKIP Valence-Arousal Predictor for IALP 2016 Shared Task. In 2016 International Conference on Asian Language Processing (IALP) (pp. 164–167). IEEE.

17. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011, June). Sentiment analysis of twitter data. In Proceedings of the Workshop on Language in Social Media (LSM 2011) (pp. 30–38).

18. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010, July). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841–842). ACM.

19. Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., & Stoyanov, V. (2015, June). Semeval-2015 task 10: Sentiment analysis in twitter. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 451–463).

20. Hsieh, Y. L., Chang, Y. C., Huang, Y. J., Yeh, S. H., Chen, C. H., & Hsu, W. L. (2017, November). MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 80–85).

21. Yu, L. C., Lee, L. H., & Wong, K. F. (2016, November). Overview of the IALP 2016 shared task on dimensional sentiment analysis for Chinese words. In 2016 International Conference on Asian Language Processing (IALP) (pp. 156–160). IEEE.

22. Chen, X., Xu, L., Liu, Z., Sun, M., & Luan, H. (2015, June). Joint learning of character and word embeddings. In Twenty-Fourth International Joint Conference on Artificial Intelligence.

23. Li, P. H., Ma, W. Y., & Wang, H. Y. (2017, December). CKIP at IJCNLP-2017 Task 2: Neural Valence-Arousal Prediction for Phrases. In Proceedings of the IJCNLP 2017, Shared Tasks (pp. 89–94).

24. Yu, L. C., Lee, L. H., Hao, S., Wang, J., He, Y., Hu, J., … & Zhang, X. (2016, June). Building Chinese affective resources in valence-arousal dimensions. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 540–545).

25. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

26. Hsieh, Y. L., Wang, C. A., Wu, Y. W., Chang, Y. C., & Hsu, W. L. (2016, November). IASL valence-arousal analysis system at IALP 2016 shared task: Dimensional sentiment analysis for Chinese words. In 2016 International Conference on Asian Language Processing (IALP) (pp. 297–299). IEEE.

27. Le, Q., & Mikolov, T. (2014, January). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188–1196).

28. Wallach, H. M. (2006, June). Topic modeling: beyond bag-of-words. In Proceedings of the 23rd international conference on Machine learning (pp. 977–984). ACM.

29. Pennington, J., Socher, R., & Manning, C. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

30. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.

31. Ravi K., & Ravi V. (2015). A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems, 89, 14–46.

32. Das A., Ganguly D., & Garain U. (2017). Named entity recognition with word embeddings and wikipedia categories for a low-resource language. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 16(3), 18.

33. Bartl, A., & Spanakis, G. (2017, December). A retrieval-based dialogue system utilizing utterance and context embeddings. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1120–1125). IEEE.

34. Cohen J. (1988). Statistical power analysis for the behavioural sciences, 2nd edn.(Hillsdale, NJ: L. Erlbaum Associates).

35. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794). ACM.

36. Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.

37. Hochreiter S., & Schmidhuber J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780. 9377276

38. Graves A., & Schmidhuber J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural networks, 18(5–6), 602–610. doi: 10.1016/j.neunet.2005.06.042 16112549

39. Lai, S., Xu, L., Liu, K., & Zhao, J. (2015, February). Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.

40. Manning C. D., Raghavan P., & Schutze H. Introduction to Information Retrieval? Cambridge University Press 2008.


Článek vyšel v časopise

PLOS One


2019 Číslo 10