Google Scholar

Behaviour suite for reinforcement learning

I Osband, Y Doron, M Hessel, J Aslanides… - arXiv preprint arXiv …, 2019 - arxiv.org

… Just as the MNIST dataset offers a clean, sanitised, test of … learning extends contextual bandit
decision problem to allow … ticularly badly on environments that require generalization and/…

Save Cite Cited by 175 Related articles All 6 versions View as HTML

[PDF] escholarship.org

[BOOK][B] Meta learning for control

Y Duan - 2017 - search.proquest.com

… 51 3.8 Further analysis on multi-armed bandits … However, these algorithms do not always
generalize straightforwardly to tasks … is executed under five random seeds. The criterion for the …

Save Cite Cited by 8 Related articles All 4 versions Library Search

[PDF] openreview.net

Reinforcement Teaching

C Muslimani, A Lewandowski… - … on Machine Learning …, 2022 - openreview.net

… as a contextual bandit problem where the state is the current … the base problems they solve
and are unable to generalize … benchmark datasets (MNIST, Fashion MNIST) and even new …

Create alert

Cite

Advanced search

Saved to My library

Behaviour suite for reinforcement learning

[BOOK][B] Meta learning for control

Reinforcement Teaching