Multi-agent reinforcement learning with approximate model learning for competitive games

English version

Autoři: Young Joon Park ^aff001; Yoon Sang Cho ^aff001; Seoung Bum Kim ^aff001
Působiště autorů: School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea ^aff001
Vyšlo v časopise: PLoS ONE 14(9)
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pone.0222215

Souhrn

We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents’ parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.

Zdroje

1. Cao Y., Yu W., Ren W., & Chen G. (2013). An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics, 9(1), 427–438. https://doi.org/10.1109/TII.2012.2219061

2. Ye D., Zhang M., & Yang Y. (2015). A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks. Sensors (Basel, Switzerland), 15(5), 10026–10047. https://doi.org/10.3390/s150510026

3. Ying W., & Dayong S. (2005). Multi-agent framework for third party logistics in E-commerce. Expert Systems with Applications, 29(2), 431–436. https://doi.org/10.1016/j.eswa.2005.04.039

4. Matarić M. J. (1997). Reinforcement Learning in the Multi-Robot Domain. Autonomous Robots, 4(1), 73–83. https://doi.org/10.1023/A:1008819414322

5. Jaderberg M., Czarnecki W. M., Dunning I., Marris L., Lever G., Castaneda A. G., et al. (2018). Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. Retrieved from https://arxiv.org/abs/1807.01281v1

6. Tampuu A., Matiisen T., Kodelja D., Kuzovkin I., Korjus K., Aru J., et al. (2017). Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE, 12(4), e0172395. doi: 10.1371/journal.pone.0172395 28380078

7. Foerster J., Farquhar G., Afouras T., Nardelli N., & Whiteson S. (2017a). Counterfactual Multi-Agent Policy Gradients. Retrieved from https://arxiv.org/abs/1705.08926v2

8. Lipowska D., & Lipowski A. (2018). Emergence of linguistic conventions in multi-agent reinforcement learning. PLOS ONE, 13(11), e0208095. doi: 10.1371/journal.pone.0208095 30496267

9. Silver D., Schrittwieser J., Simonyan K., Antonoglou I., Huang A., Guez A., et al., (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354–359. doi: 10.1038/nature24270 29052630

10. Bansal T., Pachocki J., Sidor S., Sutskever I., & Mordatch I. (2017). Emergent Complexity via Multi-Agent Competition. ArXiv:1710.03748 [Cs]. Retrieved from http://arxiv.org/abs/1710.03748

11. He, H., Boyd-Graber, J., Kwok, K., & Iii, H. D. (2016). Opponent Modeling in Deep Reinforcement Learning. International Conference on Machine Learning, 1804–1813. Retrieved from http://proceedings.mlr.press/v48/he16.html

12. Iqbal S., & Sha F. (2018). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. Retrieved from https://arxiv.org/abs/1810.02912v1

13. Liu M., Xu Y., & Mohammed A.-W. (2016). Decentralized Opportunistic Spectrum Resources Access Model and Algorithm toward Cooperative Ad-Hoc Networks. PLOS ONE, 11(1), e0145526. doi: 10.1371/journal.pone.0145526 26727504

14. Lowe R., Wu Y., Tamar A., Harb J., Abbeel P., & Mordatch I. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Retrieved from https://arxiv.org/abs/1706.02275v3

15. Rashid T., Samvelyan M., de Witt C. S., Farquhar G., Foerster J., & Whiteson S. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. Retrieved from https://arxiv.org/abs/1803.11485v2

16. Heinrich J., & Silver D. (2016). Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. ArXiv:1603.01121 [Cs]. Retrieved from http://arxiv.org/abs/1603.01121

17. Foerster J., Chen R. Y., Al-Shedivat M., Whiteson S., Abbeel P., & Mordatch I. (2017b). Learning with Opponent-Learning Awareness. ArXiv:1709.04326 [Cs]. Retrieved from http://arxiv.org/abs/1709.04326

18. Harper M., Knight V., Jones M., Koutsovoulos G., Glynatsi N. E., & Campbell O. (2017). Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma. PLOS ONE, 12(12), e0188046. doi: 10.1371/journal.pone.0188046 29228001

19. Liu S., Lever G., Merel J., Tunyasuvunakool S., Heess N., & Graepel T. (2019). Emergent Coordination Through Competition. ArXiv:1902.07151 [Cs]. Retrieved from http://arxiv.org/abs/1902.07151

20. Zschache J. (2016). Melioration Learning in Two-Person Games. PLOS ONE, 11(11), e0166708. doi: 10.1371/journal.pone.0166708 27851815

21. Hu J., & Wellman M. P. (1998). Multiagent Reinforcement LeAarlgnoinrigt:hmTheoretical Framework and an. 9.

22. Tan, M. (1993). Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning, 330–337. Morgan Kaufmann.

23. Guestrin C., Koller D., & Parr R. (2002). Multiagent Planning with Factored MDPs. In Dietterich T. G., Becker S., & Ghahramani Z. (Eds.), Advances in Neural Information Processing Systems 14 (pp. 1523–1530). Retrieved from http://papers.nips.cc/paper/1941-multiagent-planning-with-factored-mdps.pdf

24. Talvitie, E. (2017). Self-correcting models for model-based reinforcement learning. Thirty-First AAAI Conference on Artificial Intelligence.

25. Uther W. T. B., & Veloso M. M. (2003). Adaptive Agents in Multi-Agent Systems: Adaptation and Multi-Agent Learning, ser. Lecture Notes in Computer Science. Springer, 2636, 266–296.

26. Ganzfried, S., & Sandholm, T. (2011, May). Game theory-based opponent modeling in large imperfect-information games. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (pp. 533–540). International Foundation for Autonomous Agents and Multiagent Systems.

27. Billings D., Papp D., Schaeffer J., & Szafron D. (1998). Opponent modeling in poker. Aaai/iaai, 493, 499.

28. Richards M., & Amir E. (2007, January). Opponent Modeling in Scrabble. In IJCAI (pp. 1482–1487).

29. Schadd F., Bakkes S., & Spronck P. (2007). Opponent Modeling in Real-Time Strategy Games. In GAMEON (pp. 61–70).

30. Southey F., Bowling M. P., Larson B., Piccione C., Burch N., Billings D., et al. (2012). Bayes' bluff: Opponent modelling in poker. arXiv preprint arXiv:1207.1411.

31. Davidson A., Billings D., Schaeffer J., & Szafron D. (2000). Improved Opponent Modeling in Poker. 493–499. AAAI Press.

32. Lockett, A. J., Chen, C. L., & Miikkulainen, R. (2007). Evolving Explicit Opponent Models in Game Playing. Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, 2106–2113. https://doi.org/10.1145/1276958.1277367

33. Amato C., Konidaris G., Kaelbling L. P., & How J. P. (2019). Modeling and Planning with Macro-Actions in Decentralized POMDPs. Journal of Artificial Intelligence Research, 64, 817–859. https://doi.org/10.1613/jair.1.11418

34. Oliehoek F. A., Spaan M. T. J., & Vlassis N. (2008). Optimal and Approximate Q-value Functions for Decentralized POMDPs. Journal of Artificial Intelligence Research, 32, 289–353. https://doi.org/10.1613/jair.2447

35. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., et al. (2017). Attention is All you Need. In Guyon I, U. V. Luxburg, Bengio S., Wallach H., Fergus R., Vishwanathan S., & Garnett R. (Eds.), Advances in Neural Information Processing Systems 30 (pp. 5998–6008). Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

Multi-agent reinforcement learning with approximate model learning for competitive games

Souhrn

Zdroje

PLOS One

Svět praktické medicíny 1/2024 (znalostní test z časopisu)

Koncepce osteologické péče pro gynekology a praktické lékaře

Sekvenční léčba schizofrenie

Hypertenze a hypercholesterolémie – synergický efekt léčby

Význam metforminu pro „udržitelnou“ terapii diabetu