skip to main content
10.1145/1553374.1553380acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Curriculum learning

Published:14 June 2009Publication History

ABSTRACT

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).

References

  1. Allgower, E. L., & Georg, K. (1980). Numerical continuation methods. An introduction. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bengio, Y. (2009). Learning deep architectures for AI. Foundations & Trends in Mach. Learn., to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Adv. Neural Inf. Proc. Sys. 13 (pp. 932--938).Google ScholarGoogle Scholar
  4. Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Adv. Neural Inf. Proc. Sys. 19 (pp. 153--160).Google ScholarGoogle Scholar
  5. Cohn, D., Ghahramani, Z., & Jordan, M. (1995). Active learning with statistical models. Adv. Neural Inf. Proc. Sys. 7 (pp. 705--712).Google ScholarGoogle Scholar
  6. Coleman, T., & Wu, Z. (1994). Parallel continuation-based global optimization for molecular conformation and protein folding (Technical Report). Cornell University, Dept. of Computer Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Int. Conf. Mach. Learn. 2008 (pp. 160--167). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Derényi, I., Geszti, T., & Gyöörgyi, G. (1994). Generalization in the programed teaching of a perceptron. Physical Review E, 50, 3192--3200.Google ScholarGoogle ScholarCross RefCross Ref
  9. Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 781--799.Google ScholarGoogle ScholarCross RefCross Ref
  10. Erhan, D., Manzagol, P.-A., Bengio, Y., Bengio, S., & Vincent, P. (2009). The difficulty of training deep architectures and the effect of unsupervised pre-training. AI & Stat. '2009.Google ScholarGoogle Scholar
  11. Freund, Y., & Haussler, D. (1994). Unsupervised learning of distributions on binary vectors using two layer networks (Technical Report UCSC-CRL-94-25). University of California, Santa Cruz. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Håstad, J., & Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1, 113--129.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google ScholarGoogle ScholarCross RefCross Ref
  15. Krueger, K. A., & Dayan, P. (2009). Flexible shaping: how learning in small steps helps. Cognition, 110, 380--394.Google ScholarGoogle ScholarCross RefCross Ref
  16. Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Int. Conf. Mach. Learn. (pp. 473--480). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Peterson, G. B. (2004). A day of great illumination: B. F. Skinner's discovery of shaping. Journal of the Experimental Analysis of Behavior, 82, 317--328.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ranzato, M., Boureau, Y., & LeCun, Y. (2008). Sparse feature learning for deep belief networks. Adv. Neural Inf. Proc. Sys. 20 (pp. 1185--1192).Google ScholarGoogle Scholar
  19. Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. Adv. Neural Inf. Proc. Sys. 19 (pp. 1137--1144).Google ScholarGoogle Scholar
  20. Rohde, D., & Plaut, D. (1999). Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition, 72, 67--109.Google ScholarGoogle ScholarCross RefCross Ref
  21. Salakhutdinov, R., & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI & Stat. '2007.Google ScholarGoogle Scholar
  22. Salakhutdinov, R., & Hinton, G. (2008). Using Deep Belief Nets to learn covariance kernels for Gaussian processes. Adv. Neural Inf. Proc. Sys. 20 (pp. 1249--1256).Google ScholarGoogle Scholar
  23. Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Int. Conf. Mach. Learn. 2007 (pp. 791--798). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sanger, T. D. (1994). Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Trans. on Robotics and Automation, 10.Google ScholarGoogle ScholarCross RefCross Ref
  25. Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech and Signal Processing (pp. 765--768). Orlando, Florida.Google ScholarGoogle Scholar
  26. Skinner, B. F. (1958). Reinforcement today. American Psychologist, 13, 94--99.Google ScholarGoogle ScholarCross RefCross Ref
  27. Thrun, S. (1996). Explanation-based neural network learning: A lifelong learning approach. Boston, MA: Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. Int. Conf. Mach. Learn. (pp. 1096--1103). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. Int. Conf. Mach. Learn. 2008 (pp. 1168--1175). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wu, Z. (1997). Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, 814--836. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Curriculum learning

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
              June 2009
              1331 pages
              ISBN:9781605585161
              DOI:10.1145/1553374

              Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 14 June 2009

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate140of548submissions,26%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader