Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play.
These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures. However, when trained on the main game, these agents are unable to make significant progress. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain.
We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. It is a multi-agent problem with multiple players interacting there is imperfect information due to a partially observed map it has a large action space involving the selection and control of hundreds of units it has a large state space that must be observed solely from raw input feature planes and it has delayed credit assignment requiring long-term strategies over thousands of steps. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game.