Skip to main content

Adaptive Regularization in Neural Network Modeling

  • Chapter
  • First Online:
Neural Networks: Tricks of the Trade

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

Abstract

In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [25]. The idea is to minimize an empirical estimate - like the cross-validation estimate - of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Akaike: Fitting Autoregressive Models for Prediction. Annals of the Institute of Statistical Mathematics 21 (1969) 243–247

    MATH  MathSciNet  Google Scholar 

  2. S. Amari, N. Murata, K.R. Muller, M. Finke and H. Yang: Asymptotic Statistical Theory of Overtraining and Cross-Validation. Technical report METR 95-06 (1995) and IEEE Transactions on Neural Networks, 8, 5, 985–996 (1997)

    Article  Google Scholar 

  3. L. Nonboe Andersen, J. Larsen, L.K. Hansen and M. Hintz-madsen: Adaptive Regularization of Neural Classifiers. In J. Principe (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Piscataway, New Jersey: IEEE, (1997) 24–33

    Chapter  Google Scholar 

  4. C.M. Bishop: Curvature-Driven Smoothing: A Learning Algorithm for Feedforward Neural Networks. IEEE Transactions on Neural Networks 4(4) (1993) 882–884

    Article  Google Scholar 

  5. C.M. Bishop: Neural Networks for Pattern Recognition. Oxford, UK: Oxford University Press (1995)

    Google Scholar 

  6. J.E. Dennis and R.B. Schnabel: Numerical Methods for Unconstrained Optimization and Non-linear Equations. Englewood Clifis, NJ: Prentice-Hall (1983)

    Google Scholar 

  7. H. Drucker and Y. Le Cun: Improving Generalization Performance in Character Recognition. In B.H. Juang (eds.), Neural Networks for Signal Processing: Proceedings of the 1991 IEEE-SPWorkshop, Piscataway, New Jersey: IEEE (1991) 198–207

    Chapter  Google Scholar 

  8. S. Geisser: The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association (1975) 320–328

    Google Scholar 

  9. S. Geman, E. Bienenstock and R. Doursat: Neural Networks and the BiasVariance Dilemma. Neural Computation 4 (1992) 1–58

    Article  Google Scholar 

  10. F. Girosi, M. Jones, T. Poggio, Regularization Theory and Neural Networks Architectures, Neural Computation 7, 2 (1995) 219–269

    Article  Google Scholar 

  11. C. Goutte and J. Larsen: Adaptive Regularization of Neural Networks using Conjugate Gradient, in Proceedings of ICASSP’98, Seattle USA 2 (1998) 1201–1204

    Google Scholar 

  12. C. Goutte: Note on Free Lunches and Cross-Validation. Neural Computation 9(6) (1997) 1211–1215

    Article  Google Scholar 

  13. C. Goutte: Regularization with a Pruning Prior. To appear in Neural Networks (1997)

    Google Scholar 

  14. L.K. Hansen and C.E. Rasmussen: Pruning from Adaptive Regularization. Neural Computation 6 (1994) 1223–1232

    Article  MATH  Google Scholar 

  15. L.K. Hansen, C.E. Rasmussen, C. Svarer and J. Larsen: Adaptive Regularization. In J. Vlontzos, J.-N. Hwang and E. Wilson eds, Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway, New Jersey: IEEE (1994) 78–87

    Chapter  Google Scholar 

  16. L.K. Hansen and J. Larsen: Linear Unlearning for Cross-Validation. Advances in Computational Mathematics 5 (1996) 269–280

    Article  MATH  MathSciNet  Google Scholar 

  17. J. Hertz, A. Krogh and R.G. Palmer: Introduction to the Theory of Neural Computation. Redwood City, California: Addison-Wesley Publishing Company (1991)

    Google Scholar 

  18. M. Hintz-Madsen, M. With Pedersen, L.K. Hansen, and J. Larsen: Design and Evaluation of Neural Classifiers. In S. Usui, Y. Tohkura, S. Katagiri and E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Piscataway, New Jersey: IEEE, (1996) 223–232

    Chapter  Google Scholar 

  19. K. Hornik: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4 (1991) 251–257

    Article  Google Scholar 

  20. M. Kearns: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. Neural Computation 9(5) (1997) 1143–1161

    Article  Google Scholar 

  21. J. Larsen: A Generalization Error Estimate for Nonlinear Systems. In S.Y. Kung (eds.), Neural Networks for Signal Processing 2: Proceedings of the 1992 IEEE-SP Workshop, Piscataway, New Jersey: IEEE (1992) 29–38

    Chapter  Google Scholar 

  22. J. Larsen: Design of Neural Network Filters, Ph.D. Thesis, Electronics Institute, Technical University of Denmark (1993). Available via ftp://eivind.imm.dtu.dk/dist/PhDfithesis/jlarsen.thesis.ps.Z

  23. J. Larsen and L.K. Hansen: Generalization Performance of Regularized Neural Network Models. In J. Vlontzos et al. (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway, New Jersey: IEEE (1994) 42–51

    Chapter  Google Scholar 

  24. J. Larsen and L.K. Hansen: Empirical Generalization Assessment of Neural Network Models. In Fed. Girosi et at (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing V, Piscataway, New Jersey: IEEE(1995) 30–39

    Chapter  Google Scholar 

  25. J. Larsen, L.K. Hansen, C. Svarer and M. Ohlsson: Design and Regularization of Neural Networks: The Optimal Use of a Validation Set. In S. Usui, Y. Tohkura, S. Katagiri and E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Piscataway, New Jersey: IEEE, (1996) 62–71

    Chapter  Google Scholar 

  26. J. Larsen et al: Optimal Data Set Split Ratio for Empirical Generalization Error Estimates. In preparation.

    Google Scholar 

  27. Y. Le Cun, J.S. Denker and S.A. Solla: Optimal Brain Damage. In D.S. Touretzky (ed.), Advances in Neural Information Processing Systems 2, Proceedings of the 1989 Conference, San Mateo, California: Morgan Kaufmann Publishers (1990) 598–605

    Google Scholar 

  28. D. Lowe: Adaptive Radial Basis Function Nonlinearities and the Problem of Generalisation. Proc. IEE Conf. on Artificial Neural Networks, (1989) 171–175

    Google Scholar 

  29. L. Ljung: System Identification: Theory for the User. Englewood Clifis, New Jersey: Prentice-Hall (1987)

    Google Scholar 

  30. D.J.C. MacKay: A Practical Bayesian Framework for Backprop Networks. Neural Computation 4(3) (1992)448–472

    Article  Google Scholar 

  31. J. Moody: Prediction Risk and Architecture Selection for Neural Networks. In V. Cherkassky et al. (eds.), From Statistics to Neural Networks: Theory and Pattern Recognition Applications, Berlin, Germany: Springer-Verlag Series F (1994)

    Google Scholar 

  32. J. Moody, T. Rognvaldsson: Smoothing Regularizers for Projective Basis Function Networks. In Advances in Neural Information Processing Systems 9, Proceedings of the 1996 Conference, Cambridge, Massachusetts: MIT Press (1997)

    Google Scholar 

  33. N. Murata, S. Yoshizawa and S. Amari: Network Information Criterion-Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks 5(6) (1994) 865–872

    Google Scholar 

  34. S. Nowlan and G. Hinton: Simplifying Neural Networks by Soft Weight Sharing. Neural Computation 4(4) (1992) 473–493

    Article  Google Scholar 

  35. M. With Pedersen: Training Recurrent Networks. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Piscataway, New Jersey: IEEE, (1997)

    Google Scholar 

  36. G.E. Peterson and H.L. Barney: Control Methods Used in a Study of the Vowels. JASA (1952) 175–184

    Google Scholar 

  37. R.S. Shadafan and M. Niranjan: A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space. Neural Computation 6(6) (1994) 1202–1222

    Article  MATH  Google Scholar 

  38. J. Sjoberg: Non-Linear System Identification with Neural Networks, Ph.D. Thesis no. 381, Department of Electrical Engineering, Linkoping University, Sweden, (1995)

    Google Scholar 

  39. M. Stone: Cross-validatory Choice and Assessment of Statistical Predictors. Journal of the Royal Statistical Society B 36(2) (1974) 111–147

    MATH  Google Scholar 

  40. C. Svarer, L.K. Hansen, J. Larsen and C. E. Rasmussen: Designer Networks for Time Series Processing. In C.A. Kamm et al. (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing 3, Piscataway, New Jersey: IEEE (1993) 78–87

    Chapter  Google Scholar 

  41. R.L. Watrous: Current Status of PetersonBarney Vowel Formant Data. JASA (1991) 2459–2460

    Google Scholar 

  42. A.S. Weigend, B.A. Huberman and D.E. Rumelhart: Predicting the Future: A Connectionist Approach. International Journal of Neural Systems 1(3) (1990) 193–209

    Article  Google Scholar 

  43. P.M. Williams: Bayesian Regularization and Pruning using a Laplace Prior. Neural Computation 7(1) (1995) 117–143

    Article  Google Scholar 

  44. D.H. Wolpert and W.G. Macready: The Mathematics of Search Technical Report SFI-TR-95-02-010, Santa Fe Instute (1995)

    Google Scholar 

  45. L. Wu and J. Moody: A Smoothing Regularizer for Feedforward and Recurrent Neural Networks. Neural Computation 8(3) 1996

    Google Scholar 

  46. H. Zhu and R. Rohwer. No Free Lunch for Cross Validation. Neural Computation 8(7) (1996) 1421–1426

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Larsen, J., Svarer, C., Andersen, L.N., Hansen, L.K. (1998). Adaptive Regularization in Neural Network Modeling. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-49430-8_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65311-0

  • Online ISBN: 978-3-540-49430-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics