Elsevier

Artificial Intelligence

Volume 242, January 2017, Pages 132-171
Artificial Intelligence

Making friends on the fly: Cooperating with new teammates

https://doi.org/10.1016/j.artint.2016.10.005Get rights and content
Under an Elsevier user license
open archive

Abstract

Robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, they will increasingly need to interact with other robots. Additionally, the number of companies and research laboratories producing these robots is increasing, leading to the situation where these robots may not share a common communication or coordination protocol. While standards for coordination and communication may be created, we expect that robots will need to additionally reason intelligently about their teammates with limited information. This problem motivates the area of ad hoc teamwork in which an agent may potentially cooperate with a variety of teammates in order to achieve a shared goal. This article focuses on a limited version of the ad hoc teamwork problem in which an agent knows the environmental dynamics and has had past experiences with other teammates, though these experiences may not be representative of the current teammates. To tackle this problem, this article introduces a new general-purpose algorithm, PLASTIC, that reuses knowledge learned from previous teammates or provided by experts to quickly adapt to new teammates. This algorithm is instantiated in two forms: 1) PLASTIC-Model – which builds models of previous teammates' behaviors and plans behaviors online using these models and 2) PLASTIC-Policy – which learns policies for cooperating with previous teammates and selects among these policies online. We evaluate PLASTIC on two benchmark tasks: the pursuit domain and robot soccer in the RoboCup 2D simulation domain. Recognizing that a key requirement of ad hoc teamwork is adaptability to previously unseen agents, the tests use more than 40 previously unknown teams on the first task and 7 previously unknown teams on the second. While PLASTIC assumes that there is some degree of similarity between the current and past teammates' behaviors, no steps are taken in the experimental setup to make sure this assumption holds. The teammates were created by a variety of independent developers and were not designed to share any similarities. Nonetheless, the results show that PLASTIC was able to identify and exploit similarities between its current and past teammates' behaviors, allowing it to quickly adapt to new teammates.

Keywords

Ad hoc teamwork
Multiagent systems
Multiagent cooperation
Reinforcement learning
Pursuit domain
RoboCup soccer

Cited by (0)

This article contains material from 4 prior conference papers [11], [12], [13], [14].

1

This work was performed while Samuel Barrett was a graduate student at the University of Texas at Austin.