Abstract
In this paper we investigate the application of stochastic complexity theory to classification problems. In particular, we define the notion of admissible models as a function of problem complexity, the number of data pointsN, and prior belief. This allows us to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate. We discuss the application of these results to a variety of problems, including decision tree classifiers, Markov models for image segmentation, and feedforward multilayer neural network classifiers.
Similar content being viewed by others
References
Barron, A. and Barron, R. (1988) Statistical learning networks: a unifying view, presented at the 1988 Symposium on the Interface: Statistics and Computing Science, Virginia.
Barron, A. R. and Cover, T. M. (1991) Minimum complexity density estimation.IEEE Transactions on Information Theory,37, 1034–1054.
Buntine, W. L. (1992) Learning classification trees.Statistics and Computing,2, 63–73.
Buntine, W. L. and Weigend, A. S. (1991) Bayesian back-propagation. Technical Report FIA-91-22, RIACS and NASA Ames Research Center, Moffett Field, CA.
Cybenko, G. (1990) Complexity theory of neural networks and classification problems, inNeural Networks, L. Almeida (ed.), Springer Lecture Notes in Computer Science, pp. 26–45.
Gish, H. (1993) Maximum likelihood training of neural networks, inArtificial Intelligence Frontiers in Statistics, Hand, D. J. (ed.), Chapman & Hall, London.
Goodman, R. M., Symth, P. Higgins, C. and Miller, J. W. (1992) Rule-based networks for classification and probability estimation, to appear inNeural Computation.
Kovalevsky, V. A. (1980)Image Pattern Recognition, translated by A. Brown, Springer-Verlag, New York, p. 79.
Mackay, D. (1992) A practical Bayesian framework for backprop networks, to appear inNeural Computation.
Mehra, P., Rendell, L. A. and Wah, B. W. (1989) Principled constructive induction, inProceedings of IJCAI 1989, Morgan Kaufmann, San Mateo, CA, pp. 651–656.
Quinlan, J. R. and Rivest, R. (1989) Inferring decision trees using the minimum description length principle.Information and Computations,80, 227–248.
Rissanen, J. (1984) Universal coding, information, prediction, and estimation.IEEE Transactions on Information Theory,30, 629–636.
Rissanen, J. (1989)Stochastic Complexity in Statistical Inquiry, Teaneck, New Jersey: World Scientific.
Shrager, J. and Langley, P. (eds) (1990)Computational Models of Scientific Discovery and Theory Formation, Morgan Kaufmann, San Mateo, CA.
Smith, K. R. and Miller, M. I. (1990) A Bayesian approach incorporating Rissanen complexity for learning Markov random field texture models, inProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE Press, New York.
Smyth, P. (1991) On stochastic complexity and admissible models for neural network classifiers, inAdvances in Normal Information Processing 3, Touretzky, D., Lippman, R. and Moody, J. (eds), Morgan Kaufmann, San Mateo, CA, pp. 818–824.
Wallace, C. S. and Freeman, P. R. (1987) Estimation and inference by compact coding.Journal of the Royal Statistical Society B,49, 240–251.
Wallace, C. S. and Patrick, J. D. (1990) Coding decision trees. Preprint.
Wolberg, W. H. and Mangasarian, O. L. (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cytology,Proceedings of the National Academy of Sciences, U.S.A.,87, 9193–9196.
White, H. (1989) Learning in artificial neural networks: a statistical perspective.Neural Computation,1, 425–464.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Smyth, P. Admissible stochastic complexity models for classification problems. Stat Comput 2, 97–104 (1992). https://doi.org/10.1007/BF01889588
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01889588