Monte Carlo sampling for regret minimization in extensive games

M Lanctot, K Waugh, M Zinkevich… - Advances in neural …, 2009 - proceedings.neurips.cc
M Lanctot, K Waugh, M Zinkevich, M Bowling
Advances in neural information processing systems, 2009proceedings.neurips.cc
Sequential decision-making with multiple agents and imperfect information is commonly
modeled as an extensive game. One efficient method for computing Nash equilibria in large,
zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the
domain of poker, CFR has proven effective, particularly when using a domain-specific
augmentation involving chance outcome sampling. In this paper, we describe a general
family of domain independent CFR sample-based algorithms called Monte Carlo …
Abstract
Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes:{\it outcome sampling} and {\it external sampling}, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFRs bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.
proceedings.neurips.cc
Showing the best result for this search. See all results