Abstract
Model-based reinforcement learning (MBRL) plays an important role in developing control strategies for robotic systems. However, when dealing with complex platforms, it is difficult to model systems dynamics with analytic models. While data-driven tools offer an alternative to tackle this problem, collecting data on physical systems is non-trivial. Hence, smart solutions are required to effectively learn dynamics models with small amount of examples. In this paper we present an extension to Data As Demonstrator for handling controlled dynamics in order to improve the multiple-step prediction capabilities of the learned dynamics models. Results show the efficacy of our algorithm in developing LQR, iLQR, and open-loop trajectory-based control strategies on simulated benchmarks as well as physical robot platforms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Trajectories can be sub-sampled shorter than the control problem’s time horizon.
- 2.
Simulators, except the helicopter, available at https://github.com/webrot9/control_simulators with C++ and Python APIs.
References
Schaal, S., et al.: Learning from demonstration. In: NIPS, pp. 1040–1046 (1997)
Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: Quasi-online reinforcement learning for robots. In: ICRA, pp. 2997–3002 (2006)
Hester, T., Quinlan, M., Stone, P.: RTMBA: A real-time model-based reinforcement learning architecture for robot control. In: ICRA, pp. 85–90 (2012)
Thrun, S.: An approach to learning mobile robot navigation. RAS 15(4), 301–319 (1995)
Matarić, M.J.: Reinforcement learning in the multi-robot domain. In: Arkin, R.C., Bekey, G.A. (eds.) Robot Colonies, pp. 73–83. Springer, New York (1997). doi:10.1007/978-1-4757-6451-2_4
Duan, Y., Liu, Q., Xu, X.: Application of reinforcement learning in robot soccer. Eng. Appl. Artif. Intell. 20(7), 936–950 (2007)
Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. IJRR 0278364911428653 (2011)
Ko, J., Klein, D.J., Fox, D., Haehnel, D.: GP-UKF: Unscented Kalman filters with Gaussian process prediction and observation models. In: IROS, pp. 1901–1907 (2007)
Bagnell, J.A., Hneider, J.G.S.: Autonomous helicopter control using reinforcement learning policy search methods. ICRA 2, 1615–1620 (2001)
Venkatraman, A., Hebert, M., Bagnell, J.A.: Improving multi-step prediction of learned time series models. In: AAAI, pp. 3024–3030 (2015)
Van Overschee, P., De Moor, B.: N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems. Automatica 30(1), 75–93 (1994)
Ghahramani, Z., Roweis, S.T.: Learning nonlinear dynamical systems using an EM algorithm. In: NIPS, pp. 431–437 (1999)
Siddiqi, S.M., Boots, B., Gordon, G.J.: A constraint generation approach to learning stable linear dynamical systems. In: NIPS (2007)
Van Overschee, P., De Moor, B.: Subspace identification for linear systems: theory implementation applications. Springer Science & Business Media, New York (2012)
Venkatraman, A., Boots, B., Hebert, M., Bagnell, J.A.: Data as demonstrator with applications to system identification. In: ALR Workshop, NIPS (2014)
Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: ICML, pp. 1–8. ACM (2005)
Deisenroth, M., Rasmussen, C.E.: Pilco: a model-based and data-efficient approach to policy search. In: ICML, pp. 465–472 (2011)
Ross, S., Bagnell, D.: Agnostic system identification for model-based reinforcement learning. In: ICML, pp. 1703–1710 (2012)
Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., Tassa, Y.: Learning continuous control policies by stochastic value gradients. In: NIPS, pp. 2926–2934 (2015)
Abbeel, P., Ganapathi, V., Ng, A.Y.: Learning vehicular dynamics, with application to modeling helicopters. In: NIPS, pp. 1–8 (2005)
Müller, K.-R., Smola, A.J., Rätsch, G., Schölkopf, B., Kohlmorgen, J., Vapnik, V.: Predicting time series with support vector machines. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 999–1004. Springer, Heidelberg (1997). doi:10.1007/BFb0020283
Li, W., Todorov, E.: Iterative linear quadratic regulator design for nonlinear biological movement systems. In: ICINCO, vol. 1, pp. 222–229 (2004)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: NIPS (2007)
Acknowledgements
This material is based upon work supported in part by: National Science Foundation Graduate Research Fellowship Grant No. DGE-1252522, National Science Foundation NRI Purposeful Prediction Award No. 1227234, and ONR contract N000141512365. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Venkatraman, A., Capobianco, R., Pinto, L., Hebert, M., Nardi, D., Bagnell, J.A. (2017). Improved Learning of Dynamics Models for Control. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds) 2016 International Symposium on Experimental Robotics. ISER 2016. Springer Proceedings in Advanced Robotics, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-50115-4_61
Download citation
DOI: https://doi.org/10.1007/978-3-319-50115-4_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50114-7
Online ISBN: 978-3-319-50115-4
eBook Packages: EngineeringEngineering (R0)