Skip to main content

Efficient BackProp

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

Abstract

The convergence of back-propagation learning is analyzed so as to explain common phenomenon observedb y practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposedin serious technical publications. This paper gives some of those tricks, ando.ers explanations of why they work. Many authors have suggested that second-order optimization methods are advantageous for neural net training. It is shown that most “classical” second-order methods are impractical for large neural networks. A few methods are proposed that do not have these limitations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Amari. Neural learning in structuredparameter spaces — natural riemannian gradient. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 127. The MIT Press, 1997.

    Google Scholar 

  2. S. Amari. Natural gradient works e.ciently in learning. Neural Computation, 10(2):251–276, 1998.

    Article  MathSciNet  Google Scholar 

  3. R. Battiti. First-and second-order methods for learning: Between steepest descent andnewton’s method. Neural Computation, 4:141–166, 1992.

    Article  Google Scholar 

  4. S. Becker and Y. LeCun. Improving the convergence of backbropagation learning with secondo der metho ds. In David Touretzky, Geofrey Hinton, and T errence Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pages 29–37. Lawrence Erlbaum Associates, 1989.

    Google Scholar 

  5. C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.

    Google Scholar 

  6. L. Bottou. Online algorithms andsto chastic approximations. In David Saad, editor, Online Learning in Neural Networks (1997 Workshop at the Newton Institute), Cambridge, 1998. The Newton Institute Series, Cambridge University Press.

    Google Scholar 

  7. D. S. Broomheadand D. Lowe. Multivariable function interpolation andad aptive networks. Complex Systems, 2:321–355, 1988.

    MathSciNet  Google Scholar 

  8. W. L. Buntine and A. S. Weigend. Computing second order derivatives in Feed-Forwardnet works: A review. IEEE Transactions on Neural Networks, 1993. To appear.

    Google Scholar 

  9. C. Darken and J. E. Moody. Note on learning rate schedules for stochastic optimization. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 832–838. Morgan Kaufmann, San Mateo,CA, 1991.

    Google Scholar 

  10. K. I. Diamantaras and S. Y. Kung. Principal Component Neural Networks. Wiley, New York, 1996.

    MATH  Google Scholar 

  11. R. Fletcher. Practical Methods of Optimization, chapter 8.7: Polynomial time algorithms, pages 183–188. John Wiley & Sons, New York, second edition, 1987.

    MATH  Google Scholar 

  12. S. Geman, E. Bienenstock, and R. Doursat. Neural networks andthe bias/variance dilemma. Neural Computation, 4(1):1–58, 1992.

    Article  Google Scholar 

  13. L. Goldstein. Mean square optimality in the continuous time Robbins Monro procedure. Technical Report DRB-306, Dept. of Mathematics, University of Southern California, LA, 1987.

    Google Scholar 

  14. G. H. Golub and C. F. Van Loan. Matrix Computations, 2nd ed. Johns Hopkins University Press, Baltimore, 1989.

    MATH  Google Scholar 

  15. T.M. Heskes and B. Kappen. On-line learning processes in arti.cial neural networks. In J. G. Tayler, editor, Mathematical Approaches to Neural Networks, volume 51, pages 199–233. Elsevier, Amsterdam, 1993.

    Google Scholar 

  16. Robert A. Jacobs. Increasedrates of convergence through learning rate adaptation. Neural Networks, 1:295–307, 1988.

    Article  Google Scholar 

  17. A. H. Kramer and A. Sangiovanni-Vincentelli. Efficient parallel learning algorithms for neural networks. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems. Proceedings of the 1988 Conference, pages 40–48, San Mateo, CA, 1989. Morgan Kaufmann.

    Google Scholar 

  18. Y. LeCun. Modeles connexionnistes de l’apprentissage (connectionist learning models). PhD thesis, Université P. et M. Curie (Paris VI), 1987.

    Google Scholar 

  19. Y. LeCun. Generalization andnet work design strategies. In R. Pfeifer, Z. Schreter, F. Fogelman, and L. Steels, editors, Connectionism in Perspective, Amsterdam, 1989. Elsevier. Proceedings of the International Conference Connectionism in Perspective, University of Zürich, 10.–13. October 1988.

    Google Scholar 

  20. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Handwritten digit recognition with a backpropagation network. In D. S. Touretsky, editor, Advances in Neural Information Processing Systems, vol. 2, San Mateo, CA, 1990. Morgan Kaufman.

    Google Scholar 

  21. Y. LeCun, J.S. Denker, and S.A. Solla. Optimal brain damage. In D. S. Touretsky, editor, Advances in Neural Information Processing Systems, vol. 2, pages 598–605, 1990.

    Google Scholar 

  22. Y. LeCun, I. Kanter, and S. A. Solla. Secondord er properties of error surfaces. In Advances in Neural Information Processing Systems, vol. 3, San Mateo, CA, 1991. Morgan Kaufmann.

    Google Scholar 

  23. Y. LeCun, P. Y. Simard, and B. Pearlmutter. Automatic learning rate maximization by on-line estimation of the hessian’s eigenvectors. In Giles, Hanson, and Cowan, editors, Advances in Neural Information Processing Systems, vol. 5, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  24. M. MØller. A scaledconjugate gradient algorithm for fast supervisedlearning. Neural Networks, 6:525–533, 1993.

    Article  Google Scholar 

  25. M. MØller. Supervised learning on large redundant training sets. International Journal of Neural Systems, 4(1):15–25, 1993.

    Article  Google Scholar 

  26. J. E. Moody and C. J. Darken. Fast learning in networks of locally-tunedpro cessing units. Neural Computation, 1:281–294, 1989.

    Article  Google Scholar 

  27. N. Murata. (in Japanese). PhD thesis, University of Tokyo, 1992.

    Google Scholar 

  28. N. Murata, K.-R. Müller, A. Ziehe, and S. Amari. Adaptive on-line learning in changing environments. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 599. The MIT Press, 1997.

    Google Scholar 

  29. A.V. Oppenheim and R.W. Schafer. Digital Signal Processing. Prentice Hall, Englewood Cliffs, 1975.

    MATH  Google Scholar 

  30. G. B. Orr. Dynamics and Algorithms for Stochastic learning. PhD thesis, Oregon Graduate Institute, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R. (1998). Efficient BackProp. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-49430-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65311-0

  • Online ISBN: 978-3-540-49430-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics