Multi-Agent Reinforcement Learning

Plaat, Aske

doi:10.1007/978-981-19-0638-1_7

Aske Plaat²

3713 Accesses
1 Citations

Abstract

On this planet, in our societies, millions of people live and work together. Each individual has their own individual set of goals and performs their actions accordingly. Some of these goals are shared. When we want to achieve shared goals, we organize ourselves in teams, groups, companies, organizations, and societies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Human drivers have a theory of mind of other drivers. Theory of mind and the related concept of mirror neurons [56] are a psychological theory of empathy and understanding, which allows a limited amount of prediction of future behavior. Theory of mind studies how individuals simulate in their minds the actions of others, including their simulation of our actions (and of our simulations, etc.) [18, 71].
2.
https://plato.stanford.edu/entries/prisoner-dilemma/.
3.
https://ai.plainenglish.io/building-a-poker-ai-part-6-beating-kuhn-poker-with-cfr-using-python-1b4172a6ab2d.
4.
https://int8.io/counterfactual-regret-minimization-for-poker-ai/.
5.
https://github.com/int8/counterfactual-regret-minimization/blob/master/games/algorithms.py.
6.
Survival of the fittest cooperative group of individuals can also be achieved with an appropriate fitness function [101].
7.
https://www.youtube.com/watch?v=kopoLzvh5jY&t=10s.
8.
https://openai.com/blog/emergent-tool-use/.
9.
Similar to the first approach in AlphaGo, where self-play reinforcement learning was also bootstrapped by supervised learning from human games.
10.
https://github.com/openai/multi-agent-emergence-environments.
11.
https://ai.googleblog.com/2019/06/introducing-google-research-football.html.
12.
https://github.com/deepmind/pysc2.

References

Stefano Albrecht and Peter Stone. Multiagent learning: foundations and recent trends. In Tutorial at IJCAI-17 conference, 2017.
Google Scholar
Stefano Albrecht and Peter Stone. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258:66–95, 2018.
Article MathSciNet Google Scholar
Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian M. Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Satinder Singh, Thore Graepel, and Yoram Bachrach. Learning to play no-press diplomacy with best response policy iteration. In Advances in Neural Information Processing Systems, 2020.
Google Scholar
Robert Axelrod. An evolutionary approach to norms. The American Political Science Review, pages 1095–1111, 1986.
Google Scholar
Robert Axelrod. The complexity of cooperation: Agent-based models of competition and collaboration, volume 3. Princeton university press, 1997.
Book Google Scholar
Robert Axelrod. The dissemination of culture: A model with local convergence and global polarization. Journal of Conflict Resolution, 41(2):203–226, 1997.
Article Google Scholar
Robert Axelrod and Douglas Dion. The further evolution of cooperation. Science, 242(4884):1385–1390, 1988.
Article Google Scholar
Robert Axelrod and William D Hamilton. The evolution of cooperation. Science, 211(4489):1390–1396, 1981.
Google Scholar
Thomas Bäck. Evolutionary Algorithms in Theory and Practice: Evolutionary Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, 1996.
Book Google Scholar
Thomas Bäck, David B Fogel, and Zbigniew Michalewicz. Handbook of evolutionary computation. Release, 97(1):B1, 1997.
Google Scholar
Thomas Bäck, Frank Hoffmeister, and Hans-Paul Schwefel. A survey of evolution strategies. In Proceedings of the fourth International Conference on Genetic Algorithms, 1991.
Google Scholar
Thomas Bäck and Hans-Paul Schwefel. An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation, 1(1):1–23, 1993.
Article Google Scholar
Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528, 2019.
Google Scholar
Anton Bakhtin, David Wu, Adam Lerer, and Noam Brown. No-press diplomacy from scratch. Advances in Neural Information Processing Systems, 34, 2021.
Google Scholar
Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748, 2017.
Google Scholar
Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, and Michael Bowling. The Hanabi challenge: A new frontier for AI research. Artificial Intelligence, 280:103216, 2020.
Google Scholar
Nolan Bard, John Hawkin, Jonathan Rubin, and Martin Zinkevich. The annual computer poker competition. AI Magazine, 34(2):112, 2013.
Google Scholar
Simon Baron-Cohen, Alan M Leslie, and Uta Frith. Does the autistic child have a “theory of mind”? Cognition, 21(1):37–46, 1985.
Google Scholar
Gerardo Beni. Swarm intelligence. Complex Social and Behavioral Systems: Game Theory and Agent-Based Models, pages 791–818, 2020.
Google Scholar
Gerardo Beni and Jing Wang. Swarm intelligence in cellular robotic systems. In Robots and Biological Systems: Towards a New Bionics?, pages 703–712. Springer, 1993.
Google Scholar
Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819–840, 2002.
Google Scholar
Darse Billings, Aaron Davidson, Jonathan Schaeffer, and Duane Szafron. The challenge of poker. Artificial Intelligence, 134(1-2):201–240, 2002.
Article Google Scholar
Darse Billings, Aaron Davidson, Terence Schauenberg, Neil Burch, Michael Bowling, Robert Holte, Jonathan Schaeffer, and Duane Szafron. Game-tree search with adaptation in stochastic imperfect-information games. In International Conference on Computers and Games, pages 21–34. Springer, 2004.
Google Scholar
Christian Blum and Daniel Merkle. Swarm Intelligence: Introduction and Applications. Springer Science & Business Media, 2008.
Google Scholar
Eric Bonabeau, Marco Dorigo, and Guy Theraulaz. Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, 1999.
Book Google Scholar
Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up Limit Hold’em poker is solved. Science, 347(6218):145–149, 2015.
Article Google Scholar
Michael H. Bowling, Nicholas Abou Risk, Nolan Bard, Darse Billings, Neil Burch, Joshua Davidson, John Alexander Hawkin, Robert Holte, Michael Johanson, Morgan Kan, Bryce Paradis, Jonathan Schaeffer, David Schnizlein, Duane Szafron, Kevin Waugh, and Martin Zinkevich. A demonstration of the polaris poker system. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, volume 2, pages 1391–1392, 2009.
Google Scholar
Robert Boyd and Peter J Richerson. Culture and the Evolutionary Process. University of Chicago press, 1988.
Google Scholar
Noam Brown, Sam Ganzfried, and Tuomas Sandholm. Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion No-Limit Texas Hold’em agent. In AAAI Workshop: Computer Poker and Imperfect Information, 2015.
Google Scholar
Noam Brown, Adam Lerer, Sam Gross, and Tuomas Sandholm. Deep counterfactual regret minimization. In International Conference on Machine Learning, pages 793–802. PMLR, 2019.
Google Scholar
Noam Brown and Tuomas Sandholm. Superhuman AI for Heads-up No-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
Article MathSciNet Google Scholar
Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019.
Article MathSciNet Google Scholar
Lucian Busoniu, Robert Babuska, and Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008.
Article Google Scholar
Zhiyuan Cai, Huanhui Cao, Wenjie Lu, Lin Zhang, and Hao Xiong. Safe multi-agent reinforcement learning through decentralized multiple control barrier functions. arXiv preprint arXiv:2103.12553, 2021.
Google Scholar
Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, and Stephen Clark. Emergent communication through negotiation. In International Conference on Learning Representations, 2018.
Google Scholar
Edward Cartwright. Behavioral Economics. Routledge, 2018.
Book Google Scholar
Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. Back to basics: Benchmarking canonical evolution strategies for playing Atari. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, pages 1419–1426, 2018.
Google Scholar
Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O Stanley, and Jeff Clune. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in Neural Information Processing Systems, pages 5032–5043, 2018.
Google Scholar
Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R McKee, Joel Z Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative AI. arXiv preprint arXiv:2012.08630, 2020.
Google Scholar
Zhongxiang Dai, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet, and Teck-Hua Ho. R2-B2: recursive reasoning-based Bayesian optimization for no-regret learning in games. In International Conference on Machine Learning, pages 2291–2301. PMLR, 2020.
Google Scholar
Morton D Davis. Game Theory: a Nontechnical Introduction. Courier Corporation, 2012.
Google Scholar
Richard Dawkins and Nicola Davis. The Selfish Gene. Macat Library, 2017.
Google Scholar
Dave De Jonge, Tim Baarslag, Reyhan Aydoğan, Catholijn Jonker, Katsuhide Fujita, and Takayuki Ito. The challenge of negotiation in the game of diplomacy. In International Conference on Agreement Technologies, pages 100–114. Springer, 2018.
Google Scholar
Marco Dorigo. Optimization, learning and natural algorithms. PhD Thesis, Politecnico di Milano, 1992.
Google Scholar
Marco Dorigo and Mauro Birattari. Swarm intelligence. Scholarpedia, 2(9):1462, 2007.
Google Scholar
Marco Dorigo, Mauro Birattari, and Thomas Stutzle. Ant colony optimization. IEEE Computational Intelligence Magazine, 1(4):28–39, 2006.
Article Google Scholar
Marco Dorigo and Luca Maria Gambardella. Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53–66, 1997.
Google Scholar
Russell C Eberhart, Yuhui Shi, and James Kennedy. Swarm Intelligence. Elsevier, 2001.
Google Scholar
Tom Eccles, Edward Hughes, János Kramár, Steven Wheelwright, and Joel Z Leibo. Learning reciprocity in complex sequential social dilemmas. arXiv preprint arXiv:1903.08082, 2019.
Google Scholar
Agoston E Eiben and Jim E Smith. What is an evolutionary algorithm? In Introduction to Evolutionary Computing, pages 25–48. Springer, 2015.
Google Scholar
Richard Everett and Stephen Roberts. Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In 2018 AAAI Spring Symposium Series, 2018.
Google Scholar
Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I Jordan, Joseph E Gonzalez, and Sergey Levine. Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101, 2018.
Google Scholar
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
Google Scholar
Jakob N Foerster, Richard Y Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326, 2017.
Google Scholar
David B Fogel. An introduction to simulated evolutionary optimization. IEEE Transactions on Neural Networks, 5(1):3–14, 1994.
Google Scholar
Vittorio Gallese and Alvin Goldman. Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12):493–501, 1998.
Article Google Scholar
Sam Ganzfried and Tuomas Sandholm. Game theory-based opponent modeling in large imperfect-information games. In The 10th International Conference on Autonomous Agents and Multiagent Systems, volume 2, pages 533–540, 2011.
Google Scholar
Sam Ganzfried and Tuomas Sandholm. Endgame solving in large imperfect-information games. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 37–45, 2015.
Google Scholar
Gerd Gigerenzer and Daniel G Goldstein. Reasoning the fast and frugal way: models of bounded rationality. Psychological review, 103(4):650, 1996.
Google Scholar
Thomas Gilovich, Dale Griffin, and Daniel Kahneman. Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge university press, 2002.
Book Google Scholar
Andrew Gilpin and Tuomas Sandholm. A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 1007, 2006.
Google Scholar
Jonathan Gray, Adam Lerer, Anton Bakhtin, and Noam Brown. Human-level performance in no-press diplomacy via equilibrium search. arXiv preprint arXiv:2010.02923, 2020.
Google Scholar
Sven Gronauer and Klaus Diepold. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, pages 1–49, 2021.
Google Scholar
Carlos Guestrin, Daphne Koller, and Ronald Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems, volume 1, pages 1523–1530, 2001.
Google Scholar
Dongge Han, Chris Xiaoxuan Lu, Tomasz Michalak, and Michael Wooldridge. Multiagent model-based credit assignment for continuous control, 2021.
Google Scholar
Matthew John Hausknecht. Cooperation and Communication in Multiagent Deep Reinforcement Learning. PhD thesis, University of Texas at Austin, 2016.
Google Scholar
Conor F. Hayes, Roxana Radulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel de Oliveira Ramos, Marcello Restelli, Peter Vamplew, and Diederik M. Roijers. A practical guide to multi-objective reinforcement learning and planning. arXiv preprint arXiv:2103.09568, 2021.
Google Scholar
He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daumé III. Opponent modeling in deep reinforcement learning. In International Conference on Machine Learning, pages 1804–1813. PMLR, 2016.
Google Scholar
Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, SM Eslami, Martin Riedmiller, and David Silver. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
Google Scholar
Joseph Henrich, Robert Boyd, and Peter J Richerson. Five misunderstandings about cultural evolution. Human Nature, 19(2):119–137, 2008.
Google Scholar
Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, and Enrique Munoz de Cote. A survey of learning in multiagent environments: Dealing with non-stationarity. arXiv preprint arXiv:1707.09183, 2017.
Google Scholar
Pablo Hernandez-Leal, Bilal Kartal, and Matthew E Taylor. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 33(6):750–797, 2019.
Google Scholar
Francis Heylighen. What makes a Meme Successful? Selection Criteria for Cultural Evolution. Association Internationale de Cybernetique, 1998.
Google Scholar
John Holland. Adaptation in natural and artificial systems: an introductory analysis with application to biology. Control and Artificial Intelligence, 1975.
Google Scholar
Bert Hölldobler and Edward O Wilson. The Superorganism: the Beauty, Elegance, and Strangeness of Insect Societies. WW Norton & Company, 2009.
Google Scholar
Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado Van Hasselt, and David Silver. Distributed prioritized experience replay. In International Conference on Learning Representations, 2018.
Google Scholar
Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castañeda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.
Google Scholar
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, and Koray Kavukcuoglu. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
Google Scholar
Michael Johanson, Nolan Bard, Marc Lanctot, Richard G Gibson, and Michael Bowling. Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. In AAMAS, pages 837–846, 2012.
Google Scholar
Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, and Danny Lange. Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378, 2019.
Google Scholar
Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk. In Handbook of the Fundamentals of Financial Decision Making: Part I, pages 99–127. World Scientific, 2013.
Google Scholar
James Kennedy. Swarm intelligence. In Handbook of Nature-Inspired and Innovative Computing, pages 187–219. Springer, 2006.
Google Scholar
Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, and Kagan Tumer. Collaborative evolutionary reinforcement learning. In International Conference on Machine Learning, pages 3341–3350. PMLR, 2019.
Google Scholar
Shauharda Khadka and Kagan Tumer. Evolutionary reinforcement learning. arXiv preprint arXiv:1805.07917, 2018.
Google Scholar
Daan Klijn and AE Eiben. A coevolutionairy approach to deep multi-agent reinforcement learning. arXiv preprint arXiv:2104.05610, 2021.
Google Scholar
Satwik Kottur, José MF Moura, Stefan Lee, and Dhruv Batra. Natural language does not emerge ’naturally’ in multi-agent dialog. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, pages 2962–2967, 2017.
Google Scholar
Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.
Article Google Scholar
Sarit Kraus and Daniel Lehmann. Diplomat, an agent in a multi agent environment: An overview. In IEEE International Performance Computing and Communications Conference, pages 434–438, 1988.
Google Scholar
Steven Kuhn. Prisoner’s Dilemma. The Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/prisoner-dilemma/, 1997.
Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zajac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, and Sylvain Gelly. Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4501–4510, 2020.
Google Scholar
Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael H Bowling. Monte Carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems, pages 1078–1086, 2009.
Google Scholar
Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, pages 4190–4203, 2017.
Google Scholar
Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. Multi-agent cooperation and the emergence of (natural) language. In International Conference on Learning Representations, 2017.
Google Scholar
Joel Z Leibo, Edward Hughes, Marc Lanctot, and Thore Graepel. Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research. arXiv preprint arXiv:1903.00742, 2019.
Google Scholar
Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2017, São Paulo, Brazil, pages 464–473, 2017.
Google Scholar
Sheng Li, Jayesh K Gupta, Peter Morales, Ross Allen, and Mykel J Kochenderfer. Deep implicit coordination graphs for multi-agent reinforcement learning. In AAMAS ’21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021.
Google Scholar
Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994, pages 157–163. Elsevier, 1994.
Google Scholar
Chunming Liu, Xin Xu, and Dewen Hu. Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3):385–398, 2014.
Google Scholar
Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, and Thore Graepel. Emergent coordination through competition. In International Conference on Learning Representations, 2019.
Google Scholar
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent Actor-Critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pages 6379–6390, 2017.
Google Scholar
Xiaoliang Ma, Xiaodong Li, Qingfu Zhang, Ke Tang, Zhengping Liang, Weixin Xie, and Zexuan Zhu. A survey on cooperative co-evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 23(3):421–441, 2018.
Article Google Scholar
Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, and Shimon Whiteson. Maven: Multi-agent variational exploration. In Advances in Neural Information Processing Systems, pages 7611–7622, 2019.
Google Scholar
Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen McAleer, and Kagan Tumer. Evolutionary reinforcement learning for sample-efficient multiagent coordination. In International Conference on Machine Learning, 2020.
Google Scholar
Julian N Marewski, Wolfgang Gaissmaier, and Gerd Gigerenzer. Good judgments do not require complex cognition. Cognitive Processing, 11(2):103–121, 2010.
Google Scholar
Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisỳ, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
Google Scholar
Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
Google Scholar
Pol Moreno, Edward Hughes, Kevin R McKee, Bernardo Avila Pires, and Théophane Weber. Neural recursive belief states in multi-agent reinforcement learning. arXiv preprint arXiv:2102.02274, 2021.
Google Scholar
David E Moriarty, Alan C Schultz, and John J Grefenstette. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.
Google Scholar
Hossam Mossalam, Yannis M Assael, Diederik M Roijers, and Shimon Whiteson. Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707, 2016.
Google Scholar
Sendhil Mullainathan and Richard H Thaler. Behavioral economics. Technical report, National Bureau of Economic Research, 2000.
Google Scholar
Roger B Myerson. Game Theory. Harvard university press, 2013.
Google Scholar
Sylvia Nasar. A Beautiful Mind. Simon and Schuster, 2011.
MATH Google Scholar
John Nash. Non-cooperative games. Annals of mathematics, pages 286–295, 1951.
Google Scholar
John F Nash. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36(1):48–49, 1950.
Google Scholar
John F Nash Jr. The bargaining problem. Econometrica: Journal of the econometric society, pages 155–162, 1950.
Google Scholar
Frans A Oliehoek. Decentralized POMDPs. In Reinforcement Learning, pages 471–503. Springer, 2012.
Google Scholar
Frans A Oliehoek and Christopher Amato. A Concise Introduction to Decentralized POMDPs. Springer, 2016.
Google Scholar
Frans A Oliehoek, Matthijs TJ Spaan, Christopher Amato, and Shimon Whiteson. Incremental clustering and expansion for faster optimal planning in Dec-POMDPs. Journal of Artificial Intelligence Research, 46:449–509, 2013.
Google Scholar
Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690. PMLR, 2017.
Google Scholar
Santiago Ontanón, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. A survey of real-time strategy game AI research and competition in StarCraft. IEEE Transactions on Computational Intelligence and AI in Games, 5(4):293–311, 2013.
Article Google Scholar
Philip Paquette, Yuchen Lu, Steven Bocco, Max Smith, O-G Satya, Jonathan K Kummerfeld, Joelle Pineau, Satinder Singh, and Aaron C Courville. No-press diplomacy: Modeling multi-agent gameplay. In Advances in Neural Information Processing Systems, pages 4476–4487, 2019.
Google Scholar
Aske Plaat. De vlinder en de mier / The butterfly and the ant—on modeling behavior in organizations. Inaugural lecture. Tilburg University, 2010.
Google Scholar
David Premack and Guy Woodruff. Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4):515–526, 1978.
Article Google Scholar
Roxana Rădulescu, Patrick Mannion, Diederik M Roijers, and Ann Nowé. Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems, 34(1):1–52, 2020.
Google Scholar
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pages 4295–4304. PMLR, 2018.
Google Scholar
Diederik M Roijers, Willem Röpke, Ann Nowé, and Roxana Rădulescu. On following pareto-optimal policies in multi-objective planning and reinforcement learning. In Multi-Objective Decision Making Workshop, 2021.
Google Scholar
Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67–113, 2013.
Google Scholar
Willem Röpke, Roxana Radulescu, Diederik M Roijers, and Ann Ann Nowé. Communication strategies in multi-objective normal-form games. In Adaptive and Learning Agents Workshop 2021, 2021.
Google Scholar
Jonathan Rubin and Ian Watson. Computer poker: A review. Artificial intelligence, 175(5-6):958–987, 2011.
Article MathSciNet Google Scholar
Jordi Sabater and Carles Sierra. Reputation and social network analysis in multi-agent systems. In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1, pages 475–482, 2002.
Google Scholar
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864, 2017.
Google Scholar
Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson. The StarCraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, 2019.
Google Scholar
Tuomas Sandholm. The state of solving large incomplete-information games, and application to poker. AI Magazine, 31(4):13–32, 2010.
Article Google Scholar
Tuomas Sandholm. Abstraction for solving large incomplete-information games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
Google Scholar
Thomas D Seeley. The honey bee colony as a superorganism. American Scientist, 77(6):546–553, 1989.
Google Scholar
Lloyd S Shapley. Stochastic games. In Proceedings of the National Academy of Sciences, volume 39, pages 1095–1100, 1953.
Google Scholar
Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008.
Book Google Scholar
Yoav Shoham, Rob Powers, and Trond Grenager. Multi-agent reinforcement learning: a critical survey. Technical report, Stanford University, 2003.
MATH Google Scholar
Robin C Sickles and Valentin Zelenyuk. Measurement of productivity and efficiency. Cambridge University Press, 2019.
Google Scholar
David Silver, Satinder Singh, Doina Precup, and Richard S Sutton. Reward is enough. Artificial Intelligence, page 103535, 2021.
Google Scholar
David Simões, Nuno Lau, and Luís Paulo Reis. Multi agent deep learning with cooperative communication. Journal of Artificial Intelligence and Soft Computing Research, 10, 2020.
Google Scholar
Satinder Singh, Richard L Lewis, Andrew G Barto, and Jonathan Sorg. Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2(2):70–82, 2010.
Google Scholar
Stephen J Smith, Dana Nau, and Tom Throop. Computer bridge: A big win for AI planning. AI magazine, 19(2):93–93, 1998.
Google Scholar
Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, pages 5887–5896. PMLR, 2019.
Google Scholar
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567, 2017.
Google Scholar
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, 2017.
Google Scholar
Peter Sunehag, Guy Lever, Siqi Liu, Josh Merel, Nicolas Heess, Joel Z Leibo, Edward Hughes, Tom Eccles, and Thore Graepel. Reinforcement learning agents acquire flocking and symbiotic behaviour in simulated ecosystems. In Artificial Life Conference Proceedings, pages 103–110. MIT Press, 2019.
Google Scholar
Oskari Tammelin. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042, 2014.
Google Scholar
Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
Google Scholar
Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In International Conference on Machine Learning, pages 330–337, 1993.
Google Scholar
Shoshannah Tekofsky, Pieter Spronck, Martijn Goudbeek, Aske Plaat, and Jaap van den Herik. Past our prime: A study of age and play style development in Battlefield 3. IEEE Transactions on Computational Intelligence and AI in Games, 7(3):292–303, 2015.
Article Google Scholar
Justin K Terry and Benjamin Black. Multiplayer support for the arcade learning environment. arXiv preprint arXiv:2009.09341, 2020.
Google Scholar
Justin K Terry, Benjamin Black, Ananth Hari, Luis Santos, Clemens Dieffendahl, Niall L Williams, Yashas Lokesh, Caroline Horsch, and Praveen Ravi. Pettingzoo: Gym for multi-agent reinforcement learning. arXiv preprint arXiv:2009.14471, 2020.
Google Scholar
Julian Togelius, Alex J Champandard, Pier Luca Lanzi, Michael Mateas, Ana Paiva, Mike Preuss, and Kenneth O Stanley. Procedural content generation: Goals, challenges and actionable steps. In Artificial and Computational Intelligence in Games. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2013.
Google Scholar
Armon Toubman, Jan Joris Roessingh, Pieter Spronck, Aske Plaat, and Jaap Van Den Herik. Dynamic scripting with team coordination in air combat simulation. In International Conference on Industrial, Engineering and other Applications of Applied Intelligent Systems, pages 440–449. Springer, 2014.
Google Scholar
Thomas Trenner. Beating Kuhn poker with CFR using python. https://ai.plainenglish.io/building-a-poker-ai-part-6-beating-kuhn-poker-with-cfr-using-python-1b4172a6ab2d.
Karl Tuyls, Julien Perolat, Marc Lanctot, Joel Z Leibo, and Thore Graepel. A generalised method for empirical game theoretic analysis. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, 2018.
Google Scholar
Karl Tuyls and Gerhard Weiss. Multiagent learning: Basics, challenges, and prospects. AI Magazine, 33(3):41–41, 2012.
Article Google Scholar
Paul Tylkin, Goran Radanovic, and David C Parkes. Learning robust helpful behaviors in two-player cooperative Atari environments. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 1686–1688, 2021.
Google Scholar
Wiebe Van der Hoek and Michael Wooldridge. Multi-agent systems. Foundations of Artificial Intelligence, 3:887–928, 2008.
Article Google Scholar
Max J van Duijn. The Lazy Mindreader: a Humanities Perspective on Mindreading and Multiple-Order Intentionality. PhD thesis, Leiden University, 2016.
Google Scholar
Max J Van Duijn, Ineke Sluiter, and Arie Verhagen. When narrative takes over: The representation of embedded mindstates in Shakespeare’s Othello. Language and Literature, 24(2):148–166, 2015.
Google Scholar
Max J Van Duijn and Arie Verhagen. Recursive embedding of viewpoints, irregularity, and the role for a flexible framework. Pragmatics, 29(2):198–225, 2019.
Google Scholar
Kristof Van Moffaert and Ann Nowé. Multi-objective reinforcement learning using sets of pareto dominating policies. Journal of Machine Learning Research, 15(1):3483–3512, 2014.
MathSciNet MATH Google Scholar
Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander Sasha Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom Le Paine, Çaglar Gülçehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy P. Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
Google Scholar
Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John P. Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy P. Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, and Rodney Tsing. StarCraft II: A new challenge for reinforcement learning. arXiv:1708.04782, 2017.
Google Scholar
John Von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, 1944.
MATH Google Scholar
John Von Neumann, Oskar Morgenstern, and Harold William Kuhn. Theory of Games and Economic Behavior (commemorative edition). Princeton University Press, 2007.
Google Scholar
Douglas Walker and Graham Walker. The Official Rock Paper Scissors Strategy Guide. Simon and Schuster, 2004.
Google Scholar
Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, and Wei Pan. Probabilistic recursive reasoning for multi-agent reinforcement learning. In International Conference on Learning Representations, 2019.
Google Scholar
Shimon Whiteson. Evolutionary computation for reinforcement learning. In Marco A. Wiering and Martijn van Otterlo, editors, Reinforcement Learning, volume 12 of Adaptation, Learning, and Optimization, pages 325–355. Springer, 2012.
Google Scholar
Marco A Wiering, Maikel Withagen, and Mădălina M Drugan. Model-based multi-objective reinforcement learning. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 1–6. IEEE, 2014.
Google Scholar
Daan Wierstra, Tom Schaul, Jan Peters, and Jürgen Schmidhuber. Natural evolution strategies. In IEEE Congress on Evolutionary Computation, pages 3381–3387, 2008.
Google Scholar
Nick Wilkinson and Matthias Klaes. An Introduction to Behavioral Economics. Macmillan International Higher Education, 2017.
Google Scholar
Annie Wong, Thomas Bäck, Anna V. Kononova, and Aske Plaat. Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 2022.
Google Scholar
Michael Wooldridge. An Introduction to Multiagent Systems. Wiley, 2009.
Google Scholar
Anita Williams Woolley, Christopher F Chabris, Alex Pentland, Nada Hashmi, and Thomas W Malone. Evidence for a collective intelligence factor in the performance of human groups. science, 330(6004):686–688, 2010.
Google Scholar
Yaodong Yang and Jun Wang. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv preprint arXiv:2011.00583, 2020.
Google Scholar
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955, 2021.
Google Scholar
Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635, 2019.
Google Scholar
Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Basar. Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning, pages 5872–5881. PMLR, 2018.
Google Scholar
Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, and Changjie Fan. A deep Bayesian policy reuse approach against non-stationary agents. In 32nd Neural Information Processing Systems, pages 962–972, 2018.
Google Scholar
Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems, pages 1729–1736, 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

LIACS, Leiden University, Leiden, The Netherlands
Aske Plaat

Authors

Aske Plaat
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Plaat, A. (2022). Multi-Agent Reinforcement Learning. In: Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-19-0638-1_7

Download citation

DOI: https://doi.org/10.1007/978-981-19-0638-1_7
Published: 12 March 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0637-4
Online ISBN: 978-981-19-0638-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics