Cooperative Multi-agent Control Using Deep Reinforcement Learning

Gupta, Jayesh K.; Egorov, Maxim; Kochenderfer, Mykel

doi:10.1007/978-3-319-71682-4_5

Jayesh K. Gupta¹⁵,
Maxim Egorov¹⁵ &
Mykel Kochenderfer¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10642))

Included in the following conference series:

International Conference on Autonomous Agents and Multiagent Systems

17k Accesses
388 Citations
7 Altmetric

Abstract

This work considers the problem of learning cooperative policies in complex, partially observable domains without explicit communication. We extend three classes of single-agent deep reinforcement learning algorithms based on policy gradient, temporal-difference error, and actor-critic methods to cooperative multi-agent systems. To effectively scale these algorithms beyond a trivial number of agents, we combine them with a multi-agent variant of curriculum learning. The algorithms are benchmarked on a suite of cooperative control tasks, including tasks with discrete and continuous actions, as well as tasks with dozens of cooperating agents. We report the performance of the algorithms using different neural architectures, training procedures, and reward structures. We show that policy gradient methods tend to outperform both temporal-difference and actor-critic methods and that curriculum learning is vital to scaling reinforcement learning algorithms in complex multi-agent domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: International Conference on Machine Learning (ICML), pp. 330–337 (1993)
Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS), vol. 11(3), pp. 387–434 (2005)
Google Scholar
Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: a survey. J. Artif. Intell. Res. 53, 659–697 (2015)
MathSciNet MATH Google Scholar
Amato, C., Chowdhary, G., Geramifard, A., Ure, N.K., Kochenderfer, M.J.: Decentralized control of partially observable Markov decision processes. In: IEEE Conference on Decision and Control (CDC), Florence, Italy (2013)
Google Scholar
Bernstein, D.S., Zilberstein, S., Immerman, N.: The complexity of decentralized control of Markov decision processes. In: Conference on Uncertainty in Artificial Intelligence (UAI), pp. 32–37 (2000)
Google Scholar
Banerjee, B., Lyle, J., Kraemer, L., Yellamraju, R.: Sample bounded distributed reinforcement learning for decentralized POMDPs. In: AAAI Conference on Artificial Intelligence (AAAI) (2012)
Google Scholar
Omidshafiei, S., Agha-mohammadi, A.-A., Amato, C., Liu, S.-Y., How, J.P., Vian, J.: Graph-based cross entropy method for solving multi-robot decentralized POMDPs. In: IEEE International Conference on Robotics and Automation (ICRA) (2016)
Google Scholar
Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: Advances in Neural Information Processing Systems (NIPS) (2003)
Google Scholar
Lin, L.-J.: Reinforcement learning for robots using neural networks, Ph.D. dissertation. Carnegie Mellon University (1992)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. 17(39), 1–40 (2016)
MathSciNet MATH Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning (ICML) (2015)
Google Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971 (2015)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning, arXiv preprint arXiv:1602.01783 (2016)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML), pp. 41–48 (2009)
Google Scholar
Busoniu, L., Babuska, R., Schutter, B.D.: Multi-agent reinforcement learning: a survey. In: International Conference on Control, Automation, Robotics and Vision, vol. 527, pp. 1–6 (2006)
Google Scholar
Ono, N., Fukumoto, K.: A modular approach to multi-agent reinforcement learning. In: Weiß, G. (ed.) LDAIS/LIOME -1996. LNCS, vol. 1221, pp. 25–39. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62934-3_39
Chapter Google Scholar
Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: International Conference on Machine Learning (ICML), vol. 2, pp. 227–234 (2002)
Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: International Conference on Machine Learning (ICML), pp. 535–542 (2000)
Google Scholar
Singh, S.P., Jaakkola, T.S., Jordan, M.I.: Learning without state-estimation in partially observable markovian decision processes. In: International Conference on Machine Learning (ICML) (1994)
Google Scholar
Peshkin, L., Kim, K.-E., Meuleau, N., Kaelbling, L.P.: Learning to cooperate via policy search. In: Conference on Uncertainty in Artificial Intelligence (UAI), pp. 489–496 (2000)
Google Scholar
Fernández, F., Parker, L.E.: Learning in large cooperative multi-robot domains. Int. J. Robot. Autom. 16(4), 217–226 (2001)
Google Scholar
Tamakoshi, H., Ishii, S.: Multiagent reinforcement learning applied to a chase problem in a continuous world. Artif. Life Robot. 5(4), 202–206 (2001)
Article Google Scholar
Das, A.K., Fierro, R., Kumar, V., Ostrowski, J.P., Spletzer, J., Taylor, C.J.: A vision-based formation control framework. IEEE Trans. Robot. Autom. 18(5), 813–825 (2002)
Article Google Scholar
Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing networks. In: IEEE International Conference on Robotics and Automation (ICRA), vol. 2, pp. 1327–1332. IEEE (2002)
Google Scholar
Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. Proc. IEEE 95(1), 215–233 (2007)
Article Google Scholar
Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., Vicente, R.: Multiagent cooperation and competition with deep reinforcement learning, arXiv preprint arXiv:1511.08779 (2015)
Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: International Conference on Machine Learning (ICML), vol. 99, pp. 278–287 (1999)
Google Scholar
Bagnell, D., Ng, A.Y.: On local rewards and scaling distributed reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 91–98 (2005)
Google Scholar
Vidal, R., Shakernia, O., Kim, H.J., Shim, D.H., Sastry, S.: Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. IEEE Trans. Robot. Autom. 18(5), 662–669 (2002)
Article Google Scholar
Ho, J., Gupta, J.K., Ermon, S.: Model-free imitation learning with policy optimization. In: International Conference on Machine Learning (ICML) (2016)
Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym (2016)
Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 26–31 (2012)
Google Scholar
Nair, R., Tambe, M., Yokoo, M., Pynadath, D., Marsella, S.: Taming decentralized POMDPs: towards efficient policy computation for multiagent settings. In: International Joint Conference on Artificial Intelligence (IJCAI) (2003)
Google Scholar
Hauskrecht, M.: Incremental methods for computing bounds in partially observable Markov decision processes. In: AAAI Conference on Artificial Intelligence (AAAI) (1997)
Google Scholar
Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems (NIPS), pp. 1043–1049 (1998)
Google Scholar
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: Variational information maximizing exploration. arXiv preprint arXiv:1605.09674 (2016)
Kulkarni, T.D., Narasimhan, K.R., Saeedi, A., Tenenbaum, J.B.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. arXiv preprint arXiv:1604.06057 (2016)

Download references

Acknowledgements

This work was supported by Army AHPCRC grant W911NF-07-2-0027. The authors would like to thank the anonymous reviewers for their helpful comments.

Author information

Authors and Affiliations

Stanford University, Stanford, USA
Jayesh K. Gupta, Maxim Egorov & Mykel Kochenderfer

Authors

Jayesh K. Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Egorov
View author publications
You can also search for this author in PubMed Google Scholar
Mykel Kochenderfer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jayesh K. Gupta .

Editor information

Editors and Affiliations

University of Central Florida, Orlando, Florida, USA
Gita Sukthankar
IIIA-CSIC, Bellaterra, Spain
Juan A. Rodriguez-Aguilar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, J.K., Egorov, M., Kochenderfer, M. (2017). Cooperative Multi-agent Control Using Deep Reinforcement Learning. In: Sukthankar, G., Rodriguez-Aguilar, J. (eds) Autonomous Agents and Multiagent Systems. AAMAS 2017. Lecture Notes in Computer Science(), vol 10642. Springer, Cham. https://doi.org/10.1007/978-3-319-71682-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-71682-4_5
Published: 25 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71681-7
Online ISBN: 978-3-319-71682-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics