Backpropagation through the void: Optimizing control variates for black-box gradient estimation

W Grathwohl, D Choi, Y Wu, G Roeder… - arXiv preprint arXiv …, 2017 - arxiv.org
arXiv preprint arXiv:1711.00123, 2017arxiv.org
Gradient-based optimization is the foundation of deep learning and reinforcement learning.
Even when the mechanism being optimized is unknown or not differentiable, optimization
using high-variance or biased gradient estimates is still often the best strategy. We introduce
a general framework for learning low-variance, unbiased gradient estimators for black-box
functions of random variables. Our method uses gradients of a neural network trained jointly
with model parameters or policies, and is applicable in both discrete and continuous …
Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.
arxiv.org
Showing the best result for this search. See all results