Skip to main content

Multi-Agent Reinforcement Learning

  • Chapter
  • First Online:
Book cover Deep Reinforcement Learning

Abstract

On this planet, in our societies, millions of people live and work together. Each individual has their own individual set of goals and performs their actions accordingly. Some of these goals are shared. When we want to achieve shared goals, we organize ourselves in teams, groups, companies, organizations, and societies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Human drivers have a theory of mind of other drivers. Theory of mind and the related concept of mirror neurons [56] are a psychological theory of empathy and understanding, which allows a limited amount of prediction of future behavior. Theory of mind studies how individuals simulate in their minds the actions of others, including their simulation of our actions (and of our simulations, etc.) [18, 71].

  2. 2.

    https://plato.stanford.edu/entries/prisoner-dilemma/.

  3. 3.

    https://ai.plainenglish.io/building-a-poker-ai-part-6-beating-kuhn-poker-with-cfr-using-python-1b4172a6ab2d.

  4. 4.

    https://int8.io/counterfactual-regret-minimization-for-poker-ai/.

  5. 5.

    https://github.com/int8/counterfactual-regret-minimization/blob/master/games/algorithms.py.

  6. 6.

    Survival of the fittest cooperative group of individuals can also be achieved with an appropriate fitness function [101].

  7. 7.

    https://www.youtube.com/watch?v=kopoLzvh5jY&t=10s.

  8. 8.

    https://openai.com/blog/emergent-tool-use/.

  9. 9.

    Similar to the first approach in AlphaGo, where self-play reinforcement learning was also bootstrapped by supervised learning from human games.

  10. 10.

    https://github.com/openai/multi-agent-emergence-environments.

  11. 11.

    https://ai.googleblog.com/2019/06/introducing-google-research-football.html.

  12. 12.

    https://github.com/deepmind/pysc2.

References

  1. Stefano Albrecht and Peter Stone. Multiagent learning: foundations and recent trends. In Tutorial at IJCAI-17 conference, 2017.

    Google Scholar 

  2. Stefano Albrecht and Peter Stone. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258:66–95, 2018.

    Article  MathSciNet  Google Scholar 

  3. Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian M. Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Satinder Singh, Thore Graepel, and Yoram Bachrach. Learning to play no-press diplomacy with best response policy iteration. In Advances in Neural Information Processing Systems, 2020.

    Google Scholar 

  4. Robert Axelrod. An evolutionary approach to norms. The American Political Science Review, pages 1095–1111, 1986.

    Google Scholar 

  5. Robert Axelrod. The complexity of cooperation: Agent-based models of competition and collaboration, volume 3. Princeton university press, 1997.

    Book  Google Scholar 

  6. Robert Axelrod. The dissemination of culture: A model with local convergence and global polarization. Journal of Conflict Resolution, 41(2):203–226, 1997.

    Article  Google Scholar 

  7. Robert Axelrod and Douglas Dion. The further evolution of cooperation. Science, 242(4884):1385–1390, 1988.

    Article  Google Scholar 

  8. Robert Axelrod and William D Hamilton. The evolution of cooperation. Science, 211(4489):1390–1396, 1981.

    Google Scholar 

  9. Thomas Bäck. Evolutionary Algorithms in Theory and Practice: Evolutionary Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, 1996.

    Book  Google Scholar 

  10. Thomas Bäck, David B Fogel, and Zbigniew Michalewicz. Handbook of evolutionary computation. Release, 97(1):B1, 1997.

    Google Scholar 

  11. Thomas Bäck, Frank Hoffmeister, and Hans-Paul Schwefel. A survey of evolution strategies. In Proceedings of the fourth International Conference on Genetic Algorithms, 1991.

    Google Scholar 

  12. Thomas Bäck and Hans-Paul Schwefel. An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation, 1(1):1–23, 1993.

    Article  Google Scholar 

  13. Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528, 2019.

    Google Scholar 

  14. Anton Bakhtin, David Wu, Adam Lerer, and Noam Brown. No-press diplomacy from scratch. Advances in Neural Information Processing Systems, 34, 2021.

    Google Scholar 

  15. Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748, 2017.

    Google Scholar 

  16. Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, and Michael Bowling. The Hanabi challenge: A new frontier for AI research. Artificial Intelligence, 280:103216, 2020.

    Google Scholar 

  17. Nolan Bard, John Hawkin, Jonathan Rubin, and Martin Zinkevich. The annual computer poker competition. AI Magazine, 34(2):112, 2013.

    Google Scholar 

  18. Simon Baron-Cohen, Alan M Leslie, and Uta Frith. Does the autistic child have a “theory of mind”? Cognition, 21(1):37–46, 1985.

    Google Scholar 

  19. Gerardo Beni. Swarm intelligence. Complex Social and Behavioral Systems: Game Theory and Agent-Based Models, pages 791–818, 2020.

    Google Scholar 

  20. Gerardo Beni and Jing Wang. Swarm intelligence in cellular robotic systems. In Robots and Biological Systems: Towards a New Bionics?, pages 703–712. Springer, 1993.

    Google Scholar 

  21. Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819–840, 2002.

    Google Scholar 

  22. Darse Billings, Aaron Davidson, Jonathan Schaeffer, and Duane Szafron. The challenge of poker. Artificial Intelligence, 134(1-2):201–240, 2002.

    Article  Google Scholar 

  23. Darse Billings, Aaron Davidson, Terence Schauenberg, Neil Burch, Michael Bowling, Robert Holte, Jonathan Schaeffer, and Duane Szafron. Game-tree search with adaptation in stochastic imperfect-information games. In International Conference on Computers and Games, pages 21–34. Springer, 2004.

    Google Scholar 

  24. Christian Blum and Daniel Merkle. Swarm Intelligence: Introduction and Applications. Springer Science & Business Media, 2008.

    Google Scholar 

  25. Eric Bonabeau, Marco Dorigo, and Guy Theraulaz. Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, 1999.

    Book  Google Scholar 

  26. Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up Limit Hold’em poker is solved. Science, 347(6218):145–149, 2015.

    Article  Google Scholar 

  27. Michael H. Bowling, Nicholas Abou Risk, Nolan Bard, Darse Billings, Neil Burch, Joshua Davidson, John Alexander Hawkin, Robert Holte, Michael Johanson, Morgan Kan, Bryce Paradis, Jonathan Schaeffer, David Schnizlein, Duane Szafron, Kevin Waugh, and Martin Zinkevich. A demonstration of the polaris poker system. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, volume 2, pages 1391–1392, 2009.

    Google Scholar 

  28. Robert Boyd and Peter J Richerson. Culture and the Evolutionary Process. University of Chicago press, 1988.

    Google Scholar 

  29. Noam Brown, Sam Ganzfried, and Tuomas Sandholm. Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion No-Limit Texas Hold’em agent. In AAAI Workshop: Computer Poker and Imperfect Information, 2015.

    Google Scholar 

  30. Noam Brown, Adam Lerer, Sam Gross, and Tuomas Sandholm. Deep counterfactual regret minimization. In International Conference on Machine Learning, pages 793–802. PMLR, 2019.

    Google Scholar 

  31. Noam Brown and Tuomas Sandholm. Superhuman AI for Heads-up No-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.

    Article  MathSciNet  Google Scholar 

  32. Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019.

    Article  MathSciNet  Google Scholar 

  33. Lucian Busoniu, Robert Babuska, and Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008.

    Article  Google Scholar 

  34. Zhiyuan Cai, Huanhui Cao, Wenjie Lu, Lin Zhang, and Hao Xiong. Safe multi-agent reinforcement learning through decentralized multiple control barrier functions. arXiv preprint arXiv:2103.12553, 2021.

    Google Scholar 

  35. Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, and Stephen Clark. Emergent communication through negotiation. In International Conference on Learning Representations, 2018.

    Google Scholar 

  36. Edward Cartwright. Behavioral Economics. Routledge, 2018.

    Book  Google Scholar 

  37. Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. Back to basics: Benchmarking canonical evolution strategies for playing Atari. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, pages 1419–1426, 2018.

    Google Scholar 

  38. Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O Stanley, and Jeff Clune. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in Neural Information Processing Systems, pages 5032–5043, 2018.

    Google Scholar 

  39. Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R McKee, Joel Z Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative AI. arXiv preprint arXiv:2012.08630, 2020.

    Google Scholar 

  40. Zhongxiang Dai, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet, and Teck-Hua Ho. R2-B2: recursive reasoning-based Bayesian optimization for no-regret learning in games. In International Conference on Machine Learning, pages 2291–2301. PMLR, 2020.

    Google Scholar 

  41. Morton D Davis. Game Theory: a Nontechnical Introduction. Courier Corporation, 2012.

    Google Scholar 

  42. Richard Dawkins and Nicola Davis. The Selfish Gene. Macat Library, 2017.

    Google Scholar 

  43. Dave De Jonge, Tim Baarslag, Reyhan Aydoğan, Catholijn Jonker, Katsuhide Fujita, and Takayuki Ito. The challenge of negotiation in the game of diplomacy. In International Conference on Agreement Technologies, pages 100–114. Springer, 2018.

    Google Scholar 

  44. Marco Dorigo. Optimization, learning and natural algorithms. PhD Thesis, Politecnico di Milano, 1992.

    Google Scholar 

  45. Marco Dorigo and Mauro Birattari. Swarm intelligence. Scholarpedia, 2(9):1462, 2007.

    Google Scholar 

  46. Marco Dorigo, Mauro Birattari, and Thomas Stutzle. Ant colony optimization. IEEE Computational Intelligence Magazine, 1(4):28–39, 2006.

    Article  Google Scholar 

  47. Marco Dorigo and Luca Maria Gambardella. Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53–66, 1997.

    Google Scholar 

  48. Russell C Eberhart, Yuhui Shi, and James Kennedy. Swarm Intelligence. Elsevier, 2001.

    Google Scholar 

  49. Tom Eccles, Edward Hughes, János Kramár, Steven Wheelwright, and Joel Z Leibo. Learning reciprocity in complex sequential social dilemmas. arXiv preprint arXiv:1903.08082, 2019.

    Google Scholar 

  50. Agoston E Eiben and Jim E Smith. What is an evolutionary algorithm? In Introduction to Evolutionary Computing, pages 25–48. Springer, 2015.

    Google Scholar 

  51. Richard Everett and Stephen Roberts. Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In 2018 AAAI Spring Symposium Series, 2018.

    Google Scholar 

  52. Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I Jordan, Joseph E Gonzalez, and Sergey Levine. Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101, 2018.

    Google Scholar 

  53. Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.

    Google Scholar 

  54. Jakob N Foerster, Richard Y Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326, 2017.

    Google Scholar 

  55. David B Fogel. An introduction to simulated evolutionary optimization. IEEE Transactions on Neural Networks, 5(1):3–14, 1994.

    Google Scholar 

  56. Vittorio Gallese and Alvin Goldman. Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12):493–501, 1998.

    Article  Google Scholar 

  57. Sam Ganzfried and Tuomas Sandholm. Game theory-based opponent modeling in large imperfect-information games. In The 10th International Conference on Autonomous Agents and Multiagent Systems, volume 2, pages 533–540, 2011.

    Google Scholar 

  58. Sam Ganzfried and Tuomas Sandholm. Endgame solving in large imperfect-information games. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 37–45, 2015.

    Google Scholar 

  59. Gerd Gigerenzer and Daniel G Goldstein. Reasoning the fast and frugal way: models of bounded rationality. Psychological review, 103(4):650, 1996.

    Google Scholar 

  60. Thomas Gilovich, Dale Griffin, and Daniel Kahneman. Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge university press, 2002.

    Book  Google Scholar 

  61. Andrew Gilpin and Tuomas Sandholm. A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 1007, 2006.

    Google Scholar 

  62. Jonathan Gray, Adam Lerer, Anton Bakhtin, and Noam Brown. Human-level performance in no-press diplomacy via equilibrium search. arXiv preprint arXiv:2010.02923, 2020.

    Google Scholar 

  63. Sven Gronauer and Klaus Diepold. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, pages 1–49, 2021.

    Google Scholar 

  64. Carlos Guestrin, Daphne Koller, and Ronald Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems, volume 1, pages 1523–1530, 2001.

    Google Scholar 

  65. Dongge Han, Chris Xiaoxuan Lu, Tomasz Michalak, and Michael Wooldridge. Multiagent model-based credit assignment for continuous control, 2021.

    Google Scholar 

  66. Matthew John Hausknecht. Cooperation and Communication in Multiagent Deep Reinforcement Learning. PhD thesis, University of Texas at Austin, 2016.

    Google Scholar 

  67. Conor F. Hayes, Roxana Radulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel de Oliveira Ramos, Marcello Restelli, Peter Vamplew, and Diederik M. Roijers. A practical guide to multi-objective reinforcement learning and planning. arXiv preprint arXiv:2103.09568, 2021.

    Google Scholar 

  68. He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daumé III. Opponent modeling in deep reinforcement learning. In International Conference on Machine Learning, pages 1804–1813. PMLR, 2016.

    Google Scholar 

  69. Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, SM Eslami, Martin Riedmiller, and David Silver. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.

    Google Scholar 

  70. Joseph Henrich, Robert Boyd, and Peter J Richerson. Five misunderstandings about cultural evolution. Human Nature, 19(2):119–137, 2008.

    Google Scholar 

  71. Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, and Enrique Munoz de Cote. A survey of learning in multiagent environments: Dealing with non-stationarity. arXiv preprint arXiv:1707.09183, 2017.

    Google Scholar 

  72. Pablo Hernandez-Leal, Bilal Kartal, and Matthew E Taylor. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 33(6):750–797, 2019.

    Google Scholar 

  73. Francis Heylighen. What makes a Meme Successful? Selection Criteria for Cultural Evolution. Association Internationale de Cybernetique, 1998.

    Google Scholar 

  74. John Holland. Adaptation in natural and artificial systems: an introductory analysis with application to biology. Control and Artificial Intelligence, 1975.

    Google Scholar 

  75. Bert Hölldobler and Edward O Wilson. The Superorganism: the Beauty, Elegance, and Strangeness of Insect Societies. WW Norton & Company, 2009.

    Google Scholar 

  76. Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado Van Hasselt, and David Silver. Distributed prioritized experience replay. In International Conference on Learning Representations, 2018.

    Google Scholar 

  77. Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castañeda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.

    Google Scholar 

  78. Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, and Koray Kavukcuoglu. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.

    Google Scholar 

  79. Michael Johanson, Nolan Bard, Marc Lanctot, Richard G Gibson, and Michael Bowling. Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. In AAMAS, pages 837–846, 2012.

    Google Scholar 

  80. Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, and Danny Lange. Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378, 2019.

    Google Scholar 

  81. Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk. In Handbook of the Fundamentals of Financial Decision Making: Part I, pages 99–127. World Scientific, 2013.

    Google Scholar 

  82. James Kennedy. Swarm intelligence. In Handbook of Nature-Inspired and Innovative Computing, pages 187–219. Springer, 2006.

    Google Scholar 

  83. Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, and Kagan Tumer. Collaborative evolutionary reinforcement learning. In International Conference on Machine Learning, pages 3341–3350. PMLR, 2019.

    Google Scholar 

  84. Shauharda Khadka and Kagan Tumer. Evolutionary reinforcement learning. arXiv preprint arXiv:1805.07917, 2018.

    Google Scholar 

  85. Daan Klijn and AE Eiben. A coevolutionairy approach to deep multi-agent reinforcement learning. arXiv preprint arXiv:2104.05610, 2021.

    Google Scholar 

  86. Satwik Kottur, José MF Moura, Stefan Lee, and Dhruv Batra. Natural language does not emerge ’naturally’ in multi-agent dialog. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, pages 2962–2967, 2017.

    Google Scholar 

  87. Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.

    Article  Google Scholar 

  88. Sarit Kraus and Daniel Lehmann. Diplomat, an agent in a multi agent environment: An overview. In IEEE International Performance Computing and Communications Conference, pages 434–438, 1988.

    Google Scholar 

  89. Steven Kuhn. Prisoner’s Dilemma. The Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/prisoner-dilemma/, 1997.

  90. Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zajac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, and Sylvain Gelly. Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4501–4510, 2020.

    Google Scholar 

  91. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael H Bowling. Monte Carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems, pages 1078–1086, 2009.

    Google Scholar 

  92. Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, pages 4190–4203, 2017.

    Google Scholar 

  93. Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. Multi-agent cooperation and the emergence of (natural) language. In International Conference on Learning Representations, 2017.

    Google Scholar 

  94. Joel Z Leibo, Edward Hughes, Marc Lanctot, and Thore Graepel. Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research. arXiv preprint arXiv:1903.00742, 2019.

    Google Scholar 

  95. Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2017, São Paulo, Brazil, pages 464–473, 2017.

    Google Scholar 

  96. Sheng Li, Jayesh K Gupta, Peter Morales, Ross Allen, and Mykel J Kochenderfer. Deep implicit coordination graphs for multi-agent reinforcement learning. In AAMAS ’21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021.

    Google Scholar 

  97. Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994, pages 157–163. Elsevier, 1994.

    Google Scholar 

  98. Chunming Liu, Xin Xu, and Dewen Hu. Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3):385–398, 2014.

    Google Scholar 

  99. Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, and Thore Graepel. Emergent coordination through competition. In International Conference on Learning Representations, 2019.

    Google Scholar 

  100. Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent Actor-Critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pages 6379–6390, 2017.

    Google Scholar 

  101. Xiaoliang Ma, Xiaodong Li, Qingfu Zhang, Ke Tang, Zhengping Liang, Weixin Xie, and Zexuan Zhu. A survey on cooperative co-evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 23(3):421–441, 2018.

    Article  Google Scholar 

  102. Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, and Shimon Whiteson. Maven: Multi-agent variational exploration. In Advances in Neural Information Processing Systems, pages 7611–7622, 2019.

    Google Scholar 

  103. Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen McAleer, and Kagan Tumer. Evolutionary reinforcement learning for sample-efficient multiagent coordination. In International Conference on Machine Learning, 2020.

    Google Scholar 

  104. Julian N Marewski, Wolfgang Gaissmaier, and Gerd Gigerenzer. Good judgments do not require complex cognition. Cognitive Processing, 11(2):103–121, 2010.

    Google Scholar 

  105. Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisỳ, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.

    Google Scholar 

  106. Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.

    Google Scholar 

  107. Pol Moreno, Edward Hughes, Kevin R McKee, Bernardo Avila Pires, and Théophane Weber. Neural recursive belief states in multi-agent reinforcement learning. arXiv preprint arXiv:2102.02274, 2021.

    Google Scholar 

  108. David E Moriarty, Alan C Schultz, and John J Grefenstette. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.

    Google Scholar 

  109. Hossam Mossalam, Yannis M Assael, Diederik M Roijers, and Shimon Whiteson. Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707, 2016.

    Google Scholar 

  110. Sendhil Mullainathan and Richard H Thaler. Behavioral economics. Technical report, National Bureau of Economic Research, 2000.

    Google Scholar 

  111. Roger B Myerson. Game Theory. Harvard university press, 2013.

    Google Scholar 

  112. Sylvia Nasar. A Beautiful Mind. Simon and Schuster, 2011.

    MATH  Google Scholar 

  113. John Nash. Non-cooperative games. Annals of mathematics, pages 286–295, 1951.

    Google Scholar 

  114. John F Nash. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36(1):48–49, 1950.

    Google Scholar 

  115. John F Nash Jr. The bargaining problem. Econometrica: Journal of the econometric society, pages 155–162, 1950.

    Google Scholar 

  116. Frans A Oliehoek. Decentralized POMDPs. In Reinforcement Learning, pages 471–503. Springer, 2012.

    Google Scholar 

  117. Frans A Oliehoek and Christopher Amato. A Concise Introduction to Decentralized POMDPs. Springer, 2016.

    Google Scholar 

  118. Frans A Oliehoek, Matthijs TJ Spaan, Christopher Amato, and Shimon Whiteson. Incremental clustering and expansion for faster optimal planning in Dec-POMDPs. Journal of Artificial Intelligence Research, 46:449–509, 2013.

    Google Scholar 

  119. Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690. PMLR, 2017.

    Google Scholar 

  120. Santiago Ontanón, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. A survey of real-time strategy game AI research and competition in StarCraft. IEEE Transactions on Computational Intelligence and AI in Games, 5(4):293–311, 2013.

    Article  Google Scholar 

  121. Philip Paquette, Yuchen Lu, Steven Bocco, Max Smith, O-G Satya, Jonathan K Kummerfeld, Joelle Pineau, Satinder Singh, and Aaron C Courville. No-press diplomacy: Modeling multi-agent gameplay. In Advances in Neural Information Processing Systems, pages 4476–4487, 2019.

    Google Scholar 

  122. Aske Plaat. De vlinder en de mier / The butterfly and the ant—on modeling behavior in organizations. Inaugural lecture. Tilburg University, 2010.

    Google Scholar 

  123. David Premack and Guy Woodruff. Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4):515–526, 1978.

    Article  Google Scholar 

  124. Roxana Rădulescu, Patrick Mannion, Diederik M Roijers, and Ann Nowé. Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems, 34(1):1–52, 2020.

    Google Scholar 

  125. Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pages 4295–4304. PMLR, 2018.

    Google Scholar 

  126. Diederik M Roijers, Willem Röpke, Ann Nowé, and Roxana Rădulescu. On following pareto-optimal policies in multi-objective planning and reinforcement learning. In Multi-Objective Decision Making Workshop, 2021.

    Google Scholar 

  127. Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67–113, 2013.

    Google Scholar 

  128. Willem Röpke, Roxana Radulescu, Diederik M Roijers, and Ann Ann Nowé. Communication strategies in multi-objective normal-form games. In Adaptive and Learning Agents Workshop 2021, 2021.

    Google Scholar 

  129. Jonathan Rubin and Ian Watson. Computer poker: A review. Artificial intelligence, 175(5-6):958–987, 2011.

    Article  MathSciNet  Google Scholar 

  130. Jordi Sabater and Carles Sierra. Reputation and social network analysis in multi-agent systems. In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1, pages 475–482, 2002.

    Google Scholar 

  131. Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864, 2017.

    Google Scholar 

  132. Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson. The StarCraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, 2019.

    Google Scholar 

  133. Tuomas Sandholm. The state of solving large incomplete-information games, and application to poker. AI Magazine, 31(4):13–32, 2010.

    Article  Google Scholar 

  134. Tuomas Sandholm. Abstraction for solving large incomplete-information games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.

    Google Scholar 

  135. Thomas D Seeley. The honey bee colony as a superorganism. American Scientist, 77(6):546–553, 1989.

    Google Scholar 

  136. Lloyd S Shapley. Stochastic games. In Proceedings of the National Academy of Sciences, volume 39, pages 1095–1100, 1953.

    Google Scholar 

  137. Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008.

    Book  Google Scholar 

  138. Yoav Shoham, Rob Powers, and Trond Grenager. Multi-agent reinforcement learning: a critical survey. Technical report, Stanford University, 2003.

    MATH  Google Scholar 

  139. Robin C Sickles and Valentin Zelenyuk. Measurement of productivity and efficiency. Cambridge University Press, 2019.

    Google Scholar 

  140. David Silver, Satinder Singh, Doina Precup, and Richard S Sutton. Reward is enough. Artificial Intelligence, page 103535, 2021.

    Google Scholar 

  141. David Simões, Nuno Lau, and Luís Paulo Reis. Multi agent deep learning with cooperative communication. Journal of Artificial Intelligence and Soft Computing Research, 10, 2020.

    Google Scholar 

  142. Satinder Singh, Richard L Lewis, Andrew G Barto, and Jonathan Sorg. Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2(2):70–82, 2010.

    Google Scholar 

  143. Stephen J Smith, Dana Nau, and Tom Throop. Computer bridge: A big win for AI planning. AI magazine, 19(2):93–93, 1998.

    Google Scholar 

  144. Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, pages 5887–5896. PMLR, 2019.

    Google Scholar 

  145. Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567, 2017.

    Google Scholar 

  146. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, 2017.

    Google Scholar 

  147. Peter Sunehag, Guy Lever, Siqi Liu, Josh Merel, Nicolas Heess, Joel Z Leibo, Edward Hughes, Tom Eccles, and Thore Graepel. Reinforcement learning agents acquire flocking and symbiotic behaviour in simulated ecosystems. In Artificial Life Conference Proceedings, pages 103–110. MIT Press, 2019.

    Google Scholar 

  148. Oskari Tammelin. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042, 2014.

    Google Scholar 

  149. Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.

    Google Scholar 

  150. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In International Conference on Machine Learning, pages 330–337, 1993.

    Google Scholar 

  151. Shoshannah Tekofsky, Pieter Spronck, Martijn Goudbeek, Aske Plaat, and Jaap van den Herik. Past our prime: A study of age and play style development in Battlefield 3. IEEE Transactions on Computational Intelligence and AI in Games, 7(3):292–303, 2015.

    Article  Google Scholar 

  152. Justin K Terry and Benjamin Black. Multiplayer support for the arcade learning environment. arXiv preprint arXiv:2009.09341, 2020.

    Google Scholar 

  153. Justin K Terry, Benjamin Black, Ananth Hari, Luis Santos, Clemens Dieffendahl, Niall L Williams, Yashas Lokesh, Caroline Horsch, and Praveen Ravi. Pettingzoo: Gym for multi-agent reinforcement learning. arXiv preprint arXiv:2009.14471, 2020.

    Google Scholar 

  154. Julian Togelius, Alex J Champandard, Pier Luca Lanzi, Michael Mateas, Ana Paiva, Mike Preuss, and Kenneth O Stanley. Procedural content generation: Goals, challenges and actionable steps. In Artificial and Computational Intelligence in Games. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2013.

    Google Scholar 

  155. Armon Toubman, Jan Joris Roessingh, Pieter Spronck, Aske Plaat, and Jaap Van Den Herik. Dynamic scripting with team coordination in air combat simulation. In International Conference on Industrial, Engineering and other Applications of Applied Intelligent Systems, pages 440–449. Springer, 2014.

    Google Scholar 

  156. Thomas Trenner. Beating Kuhn poker with CFR using python. https://ai.plainenglish.io/building-a-poker-ai-part-6-beating-kuhn-poker-with-cfr-using-python-1b4172a6ab2d.

  157. Karl Tuyls, Julien Perolat, Marc Lanctot, Joel Z Leibo, and Thore Graepel. A generalised method for empirical game theoretic analysis. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, 2018.

    Google Scholar 

  158. Karl Tuyls and Gerhard Weiss. Multiagent learning: Basics, challenges, and prospects. AI Magazine, 33(3):41–41, 2012.

    Article  Google Scholar 

  159. Paul Tylkin, Goran Radanovic, and David C Parkes. Learning robust helpful behaviors in two-player cooperative Atari environments. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 1686–1688, 2021.

    Google Scholar 

  160. Wiebe Van der Hoek and Michael Wooldridge. Multi-agent systems. Foundations of Artificial Intelligence, 3:887–928, 2008.

    Article  Google Scholar 

  161. Max J van Duijn. The Lazy Mindreader: a Humanities Perspective on Mindreading and Multiple-Order Intentionality. PhD thesis, Leiden University, 2016.

    Google Scholar 

  162. Max J Van Duijn, Ineke Sluiter, and Arie Verhagen. When narrative takes over: The representation of embedded mindstates in Shakespeare’s Othello. Language and Literature, 24(2):148–166, 2015.

    Google Scholar 

  163. Max J Van Duijn and Arie Verhagen. Recursive embedding of viewpoints, irregularity, and the role for a flexible framework. Pragmatics, 29(2):198–225, 2019.

    Google Scholar 

  164. Kristof Van Moffaert and Ann Nowé. Multi-objective reinforcement learning using sets of pareto dominating policies. Journal of Machine Learning Research, 15(1):3483–3512, 2014.

    MathSciNet  MATH  Google Scholar 

  165. Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander Sasha Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom Le Paine, Çaglar Gülçehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy P. Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.

    Google Scholar 

  166. Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John P. Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy P. Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, and Rodney Tsing. StarCraft II: A new challenge for reinforcement learning. arXiv:1708.04782, 2017.

    Google Scholar 

  167. John Von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, 1944.

    MATH  Google Scholar 

  168. John Von Neumann, Oskar Morgenstern, and Harold William Kuhn. Theory of Games and Economic Behavior (commemorative edition). Princeton University Press, 2007.

    Google Scholar 

  169. Douglas Walker and Graham Walker. The Official Rock Paper Scissors Strategy Guide. Simon and Schuster, 2004.

    Google Scholar 

  170. Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, and Wei Pan. Probabilistic recursive reasoning for multi-agent reinforcement learning. In International Conference on Learning Representations, 2019.

    Google Scholar 

  171. Shimon Whiteson. Evolutionary computation for reinforcement learning. In Marco A. Wiering and Martijn van Otterlo, editors, Reinforcement Learning, volume 12 of Adaptation, Learning, and Optimization, pages 325–355. Springer, 2012.

    Google Scholar 

  172. Marco A Wiering, Maikel Withagen, and Mădălina M Drugan. Model-based multi-objective reinforcement learning. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 1–6. IEEE, 2014.

    Google Scholar 

  173. Daan Wierstra, Tom Schaul, Jan Peters, and Jürgen Schmidhuber. Natural evolution strategies. In IEEE Congress on Evolutionary Computation, pages 3381–3387, 2008.

    Google Scholar 

  174. Nick Wilkinson and Matthias Klaes. An Introduction to Behavioral Economics. Macmillan International Higher Education, 2017.

    Google Scholar 

  175. Annie Wong, Thomas Bäck, Anna V. Kononova, and Aske Plaat. Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 2022.

    Google Scholar 

  176. Michael Wooldridge. An Introduction to Multiagent Systems. Wiley, 2009.

    Google Scholar 

  177. Anita Williams Woolley, Christopher F Chabris, Alex Pentland, Nada Hashmi, and Thomas W Malone. Evidence for a collective intelligence factor in the performance of human groups. science, 330(6004):686–688, 2010.

    Google Scholar 

  178. Yaodong Yang and Jun Wang. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv preprint arXiv:2011.00583, 2020.

    Google Scholar 

  179. Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955, 2021.

    Google Scholar 

  180. Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635, 2019.

    Google Scholar 

  181. Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Basar. Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning, pages 5872–5881. PMLR, 2018.

    Google Scholar 

  182. Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, and Changjie Fan. A deep Bayesian policy reuse approach against non-stationary agents. In 32nd Neural Information Processing Systems, pages 962–972, 2018.

    Google Scholar 

  183. Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems, pages 1729–1736, 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Plaat, A. (2022). Multi-Agent Reinforcement Learning. In: Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-19-0638-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-0638-1_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-0637-4

  • Online ISBN: 978-981-19-0638-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics