Abstract
In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [25]. The idea is to minimize an empirical estimate - like the cross-validation estimate - of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Akaike: Fitting Autoregressive Models for Prediction. Annals of the Institute of Statistical Mathematics 21 (1969) 243–247
S. Amari, N. Murata, K.R. Muller, M. Finke and H. Yang: Asymptotic Statistical Theory of Overtraining and Cross-Validation. Technical report METR 95-06 (1995) and IEEE Transactions on Neural Networks, 8, 5, 985–996 (1997)
L. Nonboe Andersen, J. Larsen, L.K. Hansen and M. Hintz-madsen: Adaptive Regularization of Neural Classifiers. In J. Principe (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Piscataway, New Jersey: IEEE, (1997) 24–33
C.M. Bishop: Curvature-Driven Smoothing: A Learning Algorithm for Feedforward Neural Networks. IEEE Transactions on Neural Networks 4(4) (1993) 882–884
C.M. Bishop: Neural Networks for Pattern Recognition. Oxford, UK: Oxford University Press (1995)
J.E. Dennis and R.B. Schnabel: Numerical Methods for Unconstrained Optimization and Non-linear Equations. Englewood Clifis, NJ: Prentice-Hall (1983)
H. Drucker and Y. Le Cun: Improving Generalization Performance in Character Recognition. In B.H. Juang (eds.), Neural Networks for Signal Processing: Proceedings of the 1991 IEEE-SPWorkshop, Piscataway, New Jersey: IEEE (1991) 198–207
S. Geisser: The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association (1975) 320–328
S. Geman, E. Bienenstock and R. Doursat: Neural Networks and the BiasVariance Dilemma. Neural Computation 4 (1992) 1–58
F. Girosi, M. Jones, T. Poggio, Regularization Theory and Neural Networks Architectures, Neural Computation 7, 2 (1995) 219–269
C. Goutte and J. Larsen: Adaptive Regularization of Neural Networks using Conjugate Gradient, in Proceedings of ICASSP’98, Seattle USA 2 (1998) 1201–1204
C. Goutte: Note on Free Lunches and Cross-Validation. Neural Computation 9(6) (1997) 1211–1215
C. Goutte: Regularization with a Pruning Prior. To appear in Neural Networks (1997)
L.K. Hansen and C.E. Rasmussen: Pruning from Adaptive Regularization. Neural Computation 6 (1994) 1223–1232
L.K. Hansen, C.E. Rasmussen, C. Svarer and J. Larsen: Adaptive Regularization. In J. Vlontzos, J.-N. Hwang and E. Wilson eds, Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway, New Jersey: IEEE (1994) 78–87
L.K. Hansen and J. Larsen: Linear Unlearning for Cross-Validation. Advances in Computational Mathematics 5 (1996) 269–280
J. Hertz, A. Krogh and R.G. Palmer: Introduction to the Theory of Neural Computation. Redwood City, California: Addison-Wesley Publishing Company (1991)
M. Hintz-Madsen, M. With Pedersen, L.K. Hansen, and J. Larsen: Design and Evaluation of Neural Classifiers. In S. Usui, Y. Tohkura, S. Katagiri and E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Piscataway, New Jersey: IEEE, (1996) 223–232
K. Hornik: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4 (1991) 251–257
M. Kearns: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. Neural Computation 9(5) (1997) 1143–1161
J. Larsen: A Generalization Error Estimate for Nonlinear Systems. In S.Y. Kung (eds.), Neural Networks for Signal Processing 2: Proceedings of the 1992 IEEE-SP Workshop, Piscataway, New Jersey: IEEE (1992) 29–38
J. Larsen: Design of Neural Network Filters, Ph.D. Thesis, Electronics Institute, Technical University of Denmark (1993). Available via ftp://eivind.imm.dtu.dk/dist/PhDfithesis/jlarsen.thesis.ps.Z
J. Larsen and L.K. Hansen: Generalization Performance of Regularized Neural Network Models. In J. Vlontzos et al. (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway, New Jersey: IEEE (1994) 42–51
J. Larsen and L.K. Hansen: Empirical Generalization Assessment of Neural Network Models. In Fed. Girosi et at (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing V, Piscataway, New Jersey: IEEE(1995) 30–39
J. Larsen, L.K. Hansen, C. Svarer and M. Ohlsson: Design and Regularization of Neural Networks: The Optimal Use of a Validation Set. In S. Usui, Y. Tohkura, S. Katagiri and E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Piscataway, New Jersey: IEEE, (1996) 62–71
J. Larsen et al: Optimal Data Set Split Ratio for Empirical Generalization Error Estimates. In preparation.
Y. Le Cun, J.S. Denker and S.A. Solla: Optimal Brain Damage. In D.S. Touretzky (ed.), Advances in Neural Information Processing Systems 2, Proceedings of the 1989 Conference, San Mateo, California: Morgan Kaufmann Publishers (1990) 598–605
D. Lowe: Adaptive Radial Basis Function Nonlinearities and the Problem of Generalisation. Proc. IEE Conf. on Artificial Neural Networks, (1989) 171–175
L. Ljung: System Identification: Theory for the User. Englewood Clifis, New Jersey: Prentice-Hall (1987)
D.J.C. MacKay: A Practical Bayesian Framework for Backprop Networks. Neural Computation 4(3) (1992)448–472
J. Moody: Prediction Risk and Architecture Selection for Neural Networks. In V. Cherkassky et al. (eds.), From Statistics to Neural Networks: Theory and Pattern Recognition Applications, Berlin, Germany: Springer-Verlag Series F (1994)
J. Moody, T. Rognvaldsson: Smoothing Regularizers for Projective Basis Function Networks. In Advances in Neural Information Processing Systems 9, Proceedings of the 1996 Conference, Cambridge, Massachusetts: MIT Press (1997)
N. Murata, S. Yoshizawa and S. Amari: Network Information Criterion-Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks 5(6) (1994) 865–872
S. Nowlan and G. Hinton: Simplifying Neural Networks by Soft Weight Sharing. Neural Computation 4(4) (1992) 473–493
M. With Pedersen: Training Recurrent Networks. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Piscataway, New Jersey: IEEE, (1997)
G.E. Peterson and H.L. Barney: Control Methods Used in a Study of the Vowels. JASA (1952) 175–184
R.S. Shadafan and M. Niranjan: A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space. Neural Computation 6(6) (1994) 1202–1222
J. Sjoberg: Non-Linear System Identification with Neural Networks, Ph.D. Thesis no. 381, Department of Electrical Engineering, Linkoping University, Sweden, (1995)
M. Stone: Cross-validatory Choice and Assessment of Statistical Predictors. Journal of the Royal Statistical Society B 36(2) (1974) 111–147
C. Svarer, L.K. Hansen, J. Larsen and C. E. Rasmussen: Designer Networks for Time Series Processing. In C.A. Kamm et al. (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing 3, Piscataway, New Jersey: IEEE (1993) 78–87
R.L. Watrous: Current Status of PetersonBarney Vowel Formant Data. JASA (1991) 2459–2460
A.S. Weigend, B.A. Huberman and D.E. Rumelhart: Predicting the Future: A Connectionist Approach. International Journal of Neural Systems 1(3) (1990) 193–209
P.M. Williams: Bayesian Regularization and Pruning using a Laplace Prior. Neural Computation 7(1) (1995) 117–143
D.H. Wolpert and W.G. Macready: The Mathematics of Search Technical Report SFI-TR-95-02-010, Santa Fe Instute (1995)
L. Wu and J. Moody: A Smoothing Regularizer for Feedforward and Recurrent Neural Networks. Neural Computation 8(3) 1996
H. Zhu and R. Rohwer. No Free Lunch for Cross Validation. Neural Computation 8(7) (1996) 1421–1426
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Larsen, J., Svarer, C., Andersen, L.N., Hansen, L.K. (1998). Adaptive Regularization in Neural Network Modeling. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_6
Download citation
DOI: https://doi.org/10.1007/3-540-49430-8_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive