Adaptive Regularization in Neural Network Modeling

Larsen, Jan; Svarer, Claus; Andersen, Lars Nonboe; Hansen, Lars Kai

doi:10.1007/3-540-49430-8_6

Jan Larsen⁶,
Claus Svarer⁷,
Lars Nonboe Andersen⁶ &
…
Lars Kai Hansen⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

5480 Accesses
15 Citations

Abstract

In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [25]. The idea is to minimize an empirical estimate - like the cross-validation estimate - of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Akaike: Fitting Autoregressive Models for Prediction. Annals of the Institute of Statistical Mathematics 21 (1969) 243–247
MATH MathSciNet Google Scholar
S. Amari, N. Murata, K.R. Muller, M. Finke and H. Yang: Asymptotic Statistical Theory of Overtraining and Cross-Validation. Technical report METR 95-06 (1995) and IEEE Transactions on Neural Networks, 8, 5, 985–996 (1997)
Article Google Scholar
L. Nonboe Andersen, J. Larsen, L.K. Hansen and M. Hintz-madsen: Adaptive Regularization of Neural Classifiers. In J. Principe (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Piscataway, New Jersey: IEEE, (1997) 24–33
Chapter Google Scholar
C.M. Bishop: Curvature-Driven Smoothing: A Learning Algorithm for Feedforward Neural Networks. IEEE Transactions on Neural Networks 4(4) (1993) 882–884
Article Google Scholar
C.M. Bishop: Neural Networks for Pattern Recognition. Oxford, UK: Oxford University Press (1995)
Google Scholar
J.E. Dennis and R.B. Schnabel: Numerical Methods for Unconstrained Optimization and Non-linear Equations. Englewood Clifis, NJ: Prentice-Hall (1983)
Google Scholar
H. Drucker and Y. Le Cun: Improving Generalization Performance in Character Recognition. In B.H. Juang (eds.), Neural Networks for Signal Processing: Proceedings of the 1991 IEEE-SPWorkshop, Piscataway, New Jersey: IEEE (1991) 198–207
Chapter Google Scholar
S. Geisser: The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association (1975) 320–328
Google Scholar
S. Geman, E. Bienenstock and R. Doursat: Neural Networks and the BiasVariance Dilemma. Neural Computation 4 (1992) 1–58
Article Google Scholar
F. Girosi, M. Jones, T. Poggio, Regularization Theory and Neural Networks Architectures, Neural Computation 7, 2 (1995) 219–269
Article Google Scholar
C. Goutte and J. Larsen: Adaptive Regularization of Neural Networks using Conjugate Gradient, in Proceedings of ICASSP’98, Seattle USA 2 (1998) 1201–1204
Google Scholar
C. Goutte: Note on Free Lunches and Cross-Validation. Neural Computation 9(6) (1997) 1211–1215
Article Google Scholar
C. Goutte: Regularization with a Pruning Prior. To appear in Neural Networks (1997)
Google Scholar
L.K. Hansen and C.E. Rasmussen: Pruning from Adaptive Regularization. Neural Computation 6 (1994) 1223–1232
Article MATH Google Scholar
L.K. Hansen, C.E. Rasmussen, C. Svarer and J. Larsen: Adaptive Regularization. In J. Vlontzos, J.-N. Hwang and E. Wilson eds, Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway, New Jersey: IEEE (1994) 78–87
Chapter Google Scholar
L.K. Hansen and J. Larsen: Linear Unlearning for Cross-Validation. Advances in Computational Mathematics 5 (1996) 269–280
Article MATH MathSciNet Google Scholar
J. Hertz, A. Krogh and R.G. Palmer: Introduction to the Theory of Neural Computation. Redwood City, California: Addison-Wesley Publishing Company (1991)
Google Scholar
M. Hintz-Madsen, M. With Pedersen, L.K. Hansen, and J. Larsen: Design and Evaluation of Neural Classifiers. In S. Usui, Y. Tohkura, S. Katagiri and E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Piscataway, New Jersey: IEEE, (1996) 223–232
Chapter Google Scholar
K. Hornik: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4 (1991) 251–257
Article Google Scholar
M. Kearns: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. Neural Computation 9(5) (1997) 1143–1161
Article Google Scholar
J. Larsen: A Generalization Error Estimate for Nonlinear Systems. In S.Y. Kung (eds.), Neural Networks for Signal Processing 2: Proceedings of the 1992 IEEE-SP Workshop, Piscataway, New Jersey: IEEE (1992) 29–38
Chapter Google Scholar
J. Larsen: Design of Neural Network Filters, Ph.D. Thesis, Electronics Institute, Technical University of Denmark (1993). Available via ftp://eivind.imm.dtu.dk/dist/PhDfithesis/jlarsen.thesis.ps.Z
J. Larsen and L.K. Hansen: Generalization Performance of Regularized Neural Network Models. In J. Vlontzos et al. (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway, New Jersey: IEEE (1994) 42–51
Chapter Google Scholar
J. Larsen and L.K. Hansen: Empirical Generalization Assessment of Neural Network Models. In Fed. Girosi et at (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing V, Piscataway, New Jersey: IEEE(1995) 30–39
Chapter Google Scholar
J. Larsen, L.K. Hansen, C. Svarer and M. Ohlsson: Design and Regularization of Neural Networks: The Optimal Use of a Validation Set. In S. Usui, Y. Tohkura, S. Katagiri and E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Piscataway, New Jersey: IEEE, (1996) 62–71
Chapter Google Scholar
J. Larsen et al: Optimal Data Set Split Ratio for Empirical Generalization Error Estimates. In preparation.
Google Scholar
Y. Le Cun, J.S. Denker and S.A. Solla: Optimal Brain Damage. In D.S. Touretzky (ed.), Advances in Neural Information Processing Systems 2, Proceedings of the 1989 Conference, San Mateo, California: Morgan Kaufmann Publishers (1990) 598–605
Google Scholar
D. Lowe: Adaptive Radial Basis Function Nonlinearities and the Problem of Generalisation. Proc. IEE Conf. on Artificial Neural Networks, (1989) 171–175
Google Scholar
L. Ljung: System Identification: Theory for the User. Englewood Clifis, New Jersey: Prentice-Hall (1987)
Google Scholar
D.J.C. MacKay: A Practical Bayesian Framework for Backprop Networks. Neural Computation 4(3) (1992)448–472
Article Google Scholar
J. Moody: Prediction Risk and Architecture Selection for Neural Networks. In V. Cherkassky et al. (eds.), From Statistics to Neural Networks: Theory and Pattern Recognition Applications, Berlin, Germany: Springer-Verlag Series F (1994)
Google Scholar
J. Moody, T. Rognvaldsson: Smoothing Regularizers for Projective Basis Function Networks. In Advances in Neural Information Processing Systems 9, Proceedings of the 1996 Conference, Cambridge, Massachusetts: MIT Press (1997)
Google Scholar
N. Murata, S. Yoshizawa and S. Amari: Network Information Criterion-Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks 5(6) (1994) 865–872
Google Scholar
S. Nowlan and G. Hinton: Simplifying Neural Networks by Soft Weight Sharing. Neural Computation 4(4) (1992) 473–493
Article Google Scholar
M. With Pedersen: Training Recurrent Networks. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Piscataway, New Jersey: IEEE, (1997)
Google Scholar
G.E. Peterson and H.L. Barney: Control Methods Used in a Study of the Vowels. JASA (1952) 175–184
Google Scholar
R.S. Shadafan and M. Niranjan: A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space. Neural Computation 6(6) (1994) 1202–1222
Article MATH Google Scholar
J. Sjoberg: Non-Linear System Identification with Neural Networks, Ph.D. Thesis no. 381, Department of Electrical Engineering, Linkoping University, Sweden, (1995)
Google Scholar
M. Stone: Cross-validatory Choice and Assessment of Statistical Predictors. Journal of the Royal Statistical Society B 36(2) (1974) 111–147
MATH Google Scholar
C. Svarer, L.K. Hansen, J. Larsen and C. E. Rasmussen: Designer Networks for Time Series Processing. In C.A. Kamm et al. (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing 3, Piscataway, New Jersey: IEEE (1993) 78–87
Chapter Google Scholar
R.L. Watrous: Current Status of PetersonBarney Vowel Formant Data. JASA (1991) 2459–2460
Google Scholar
A.S. Weigend, B.A. Huberman and D.E. Rumelhart: Predicting the Future: A Connectionist Approach. International Journal of Neural Systems 1(3) (1990) 193–209
Article Google Scholar
P.M. Williams: Bayesian Regularization and Pruning using a Laplace Prior. Neural Computation 7(1) (1995) 117–143
Article Google Scholar
D.H. Wolpert and W.G. Macready: The Mathematics of Search Technical Report SFI-TR-95-02-010, Santa Fe Instute (1995)
Google Scholar
L. Wu and J. Moody: A Smoothing Regularizer for Feedforward and Recurrent Neural Networks. Neural Computation 8(3) 1996
Google Scholar
H. Zhu and R. Rohwer. No Free Lunch for Cross Validation. Neural Computation 8(7) (1996) 1421–1426
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jan Larsen, Lars Nonboe Andersen & Lars Kai Hansen
Neurobiology Research Unit Department of Neurology, Building 9201, Copenhagen University Hospital, Blegdamsvej 9, DK-2100, Copenhagen, Denmark
Claus Svarer

Authors

Jan Larsen
View author publications
You can also search for this author in PubMed Google Scholar
Claus Svarer
View author publications
You can also search for this author in PubMed Google Scholar
Lars Nonboe Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Lars Kai Hansen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Willamette University, Salem, OR, 97301, USA
Genevieve B. Orr
GMD First (Forschungszentrum Informationstechnik), Rudower Chaussee 5, D-12489, Berlin, Germany
Klaus-Robert Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Larsen, J., Svarer, C., Andersen, L.N., Hansen, L.K. (1998). Adaptive Regularization in Neural Network Modeling. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_6

Download citation

DOI: https://doi.org/10.1007/3-540-49430-8_6
Published: 28 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics