Skip to main content

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 49))

Abstract

Unsupervised learning of representations has been found useful in many applications and benefits from several advantages, e.g., where there are many unlabeled examples and few labeled ones (semi-supervised learning), or where the unlabeled or labeled examples are from a distribution different but related to the one of interest (self-taught learning, multi-task learning, and domain adaptation). Some of these algorithms have successfully been used to learn a hierarchy of features, i.e., to build a deep architecture, either as initialization for a supervised predictor, or as a generative model. Deep learning algorithms can yield representations that are more abstract and better disentangle the hidden factors of variation underlying the unknown generating distribution, i.e., to capture invariances and discover non-local structure in that distribution. This chapter reviews the main motivations and ideas behind deep learning algorithms and their representation-learning components, as well as recent results in this area, and proposes a vision of challenges and hopes on the road ahead, focusing on the questions of invariance and disentangling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bagnell, J.A., Bradley, D.M.: Differentiable sparse coding. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21 (NIPS 2008), pp. 113–120 (2009)

    Google Scholar 

  • Barron, A.E.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. on Information Theory 39, 930–945 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Bartfeld, E., Grinvald, A.: Relationships between orientation-preference pinwheels, cytochrome oxidase blobs, and ocular-dominance columns in primate striate cortex. Proc. Nati. Acad. Sci. USA 89, 11905–11909 (1992)

    Article  Google Scholar 

  • Becker, S., Hinton, G.E.: Learning mixture models of spatial coherence. Neural Computation 5, 267–277 (1993)

    Article  Google Scholar 

  • Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009); also published as a book. Now Publishers (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Bengio, Y., Delalleau, O.: Justifying and generalizing contrastive divergence. Neural Computation 21(6), 1601–1621 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Bengio, Y., Delalleau, O.: Shallow versus deep sum-product networks. In: The Learning Workshop, Fort Lauderdale, Florida (2011)

    Google Scholar 

  • Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines. MIT Press (2007)

    Google Scholar 

  • Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., Ouimet, M.: Spectral Dimensionality Reduction. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications, vol. 207, pp. 519–550. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 153–160. MIT Press (2007)

    Google Scholar 

  • Bengio, Y., Bastien, F., Bergeron, A., Boulanger-Lewandowski, N., Chherawala, Y., Cisse, M., Côté, M., Erhan, D., Eustache, J., Glorot, X., Muller, X., Pannetier-Lebeuf, S., Pascanu, R., Savard, F., Sicard, G.: Deep self-taught learning for handwritten character recognition. In: NIPS*2010 Deep Learning and Unsupervised Feature Learning Workshop (2010)

    Google Scholar 

  • Berkes, P., Wiskott, L.: Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision 5(6), 579–602 (2005)

    Article  Google Scholar 

  • Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics 59, 291–294 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Braverman, M.: Poly-logarithmic independence fools bounded-depth boolean circuits. Communications of the ACM 54(4), 108–115 (2011)

    Article  MathSciNet  Google Scholar 

  • Breuleux, O., Bengio, Y., Vincent, P.: Quickly generating representative samples from an RBM-derived process. Neural Computation 23(8), 2058–2073 (2011)

    Article  MathSciNet  Google Scholar 

  • Bromley, J., Benz, J., Bottou, L., Guyon, I., Jackel, L., LeCun, Y., Moore, C., Sackinger, E., Shah, R.: Signature verification using a siamese time delay neural network. In: Advances in Pattern Recognition Systems using Neural Network Technologies, pp. 669–687. World Scientific, Singapore (1993)

    Google Scholar 

  • Cadieu, C., Olshausen, B.: Learning transformational invariants from natural movies. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21, pp. 209–216. MIT Press (2009)

    Google Scholar 

  • Cardoso, J.-F.: Multidimensional independent component analysis. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1941–1944 (1998)

    Google Scholar 

  • Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2005). IEEE Press (2005)

    Google Scholar 

  • Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008), pp. 160–167. ACM (2008)

    Google Scholar 

  • Courville, A., Bergstra, J., Bengio, Y.: A spike and slab restricted Boltzmann machine. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011) (2011)

    Google Scholar 

  • Desjardins, G., Courville, A., Bengio, Y., Vincent, P., Delalleau, O.: Tempered Markov chain Monte-Carlo for training of restricted Boltzmann machine. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), pp. 145–152 (2010)

    Google Scholar 

  • Erhan, D., Courville, A., Bengio, Y.: Understanding representations learned in deep architectures. Technical Report 1355, Université de Montréal/DIRO (2010a)

    Google Scholar 

  • Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research 11, 625–660 (2010b)

    MathSciNet  MATH  Google Scholar 

  • Goodfellow, I., Le, Q., Saxe, A., Ng, A.: Measuring invariances in deep networks. In: Bengio, Y., Schuurmans, D., Williams, C., Lafferty, J., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22 (NIPS 2009), pp. 646–654 (2009)

    Google Scholar 

  • Grimes, D.B., Rao, R.P.: Bilinear sparse coding for invariant vision. Neural Computation 17(1), 47–73 (2005)

    Article  Google Scholar 

  • Gutmann, M., Hyvarinen, A.: Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010 (2010)

    Google Scholar 

  • Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2006), pp. 1735–1742. IEEE Press (2006)

    Google Scholar 

  • Håstad, J.: Almost optimal lower bounds for small depth circuits. In: Proceedings of the 18th Annual ACM Symposium on Theory of Computing, Berkeley, California, pp. 6–20. ACM Press (1986)

    Google Scholar 

  • Håstad, J., Goldmann, M.: On the power of small-depth threshold circuits. Computational Complexity 1, 113–129 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Hinton, G.E.: Products of experts. In: Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN), Edinburgh, Scotland, vol. 1, pp. 1–6. IEE (1999)

    Google Scholar 

  • Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length, and Helmholtz free energy. In: Cowan, D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems 6 (NIPS 1993), pp. 3–10. Morgan Kaufmann Publishers, Inc. (1994)

    Google Scholar 

  • Hinton, G.E., Sejnowski, T.J., Ackley, D.H.: Boltzmann machines: Constraint satisfaction networks that learn. Technical Report TR-CMU-CS-84-119, Carnegie-Mellon University, Dept. of Computer Science (1984)

    Google Scholar 

  • Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology 148, 574–591 (1959)

    Google Scholar 

  • Hurri, J., Hyvärinen, A.: Temporal coherence, natural image sequences, and the visual cortex. In: Advances in Neural Information Processing Systems 15 (NIPS 2002), pp. 141–148 (2003)

    Google Scholar 

  • Hyvärinen, A.: Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research 6, 695–709 (2005)

    MATH  Google Scholar 

  • Hyvärinen, A., Hoyer, P.: Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Computation 12(7), 1705–1720 (2000)

    Article  Google Scholar 

  • Hyvärinen, A., Hoyer, P.O., Inki, M.O.: Topographic independent component analysis. Neural Computation 13(7), 1527–1558 (2001)

    Article  MATH  Google Scholar 

  • Jain, V., Seung, S.H.: Natural image denoising with convolutional networks. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21 (NIPS 2008), pp. 769–776 (2008)

    Google Scholar 

  • Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: Proc. International Conference on Computer Vision (ICCV 2009), pp. 2146–2153. IEEE (2009)

    Google Scholar 

  • Jenatton, R., Audibert, J.-Y., Bach, F.: Structured variable selection with sparsity-inducing norms. Technical report, arXiv:0904.3523 (2009)

    Google Scholar 

  • Jordan, M.I.: Learning in Graphical Models. Kluwer, Dordrecht (1998)

    Book  MATH  Google Scholar 

  • Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithms with applications to object recognition. Technical report, Computational and Biological Learning Lab, Courant Institute, NYU. Tech Report CBLL-TR-2008-12-01 (2008)

    Google Scholar 

  • Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2009), pp. 1605–1612. IEEE (2009)

    Google Scholar 

  • Kavukcuoglu, K., Sermanet, P., Boureau, Y.-L., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierarchies for visual recognition. In: Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 1090–1098 (2010)

    Google Scholar 

  • Kingma, D., LeCun, Y.: Regularized estimation of image statistics by score matching. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1126–1134 (2010)

    Google Scholar 

  • Klampfl, S., Maass, W.: Replacing supervised classification learning by slow feature analysis in spiking neural networks. In: Bengio, Y., Schuurmans, D., Williams, C., Lafferty, J., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22 (NIPS 2009), pp. 988–996 (2009)

    Google Scholar 

  • Kohonen, T.: The self-organizing map. Proceedings of the IEEE 78(9), 1464–1480 (1990)

    Article  Google Scholar 

  • Kohonen, T.: Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map. Biological Cybernetics 75, 281–291 (1996), doi:10.1007/s004220050295

    Article  MATH  Google Scholar 

  • Kohonen, T., Nemeth, G., Bry, K.-J., Jalanko, M., Riittinen, H.: Spectral classification of phonemes by learning subspaces. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1979, vol. 4, pp. 97–100 (1979)

    Google Scholar 

  • Körding, K.P., Kayser, C., Einhäuser, W., König, P.: How are complex cell properties adapted to the statistics of natural stimuli? Journal of Neurophysiology 91, 206–212 (2004)

    Article  Google Scholar 

  • Krizhevsky, A.: Convolutional deep belief networks on cifar-10 (2010) (unpublished manuscript) http://www.cs.utoronto.ca/~kriz/conv-cifar10-aug2010.pdf

  • Kurkova, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. on Information Theory 54, 5681–5688 (2008)

    Article  MathSciNet  Google Scholar 

  • Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 2, pp. 646–651. AAAI Press (2008)

    Google Scholar 

  • Le, Q., Ngiam, J., Chen, Z., Hao Chia, D.J., Koh, P.W., Ng, A.: Tiled convolutional neural networks. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 1279–1287 (2010)

    Google Scholar 

  • Le Roux, N., Bengio, Y.: Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation 20(6), 1631–1649 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation 1(4), 541–551 (1989)

    Article  Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  • Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Bottou, L., Littman, M. (eds.) Proceedings of the Twenty-Sixth International Conference on Machine Learning (ICML 2009). ACM, Montreal (2009a)

    Google Scholar 

  • Lee, H., Pham, P., Largman, Y., Ng, A.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Bengio, Y., Schuurmans, D., Williams, C., Lafferty, J., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22 (NIPS 2009), pp. 1096–1104 (2009b)

    Google Scholar 

  • Lee, J.A., Verleysen, M.: Nonlinear dimensionality reduction. Springer (2007)

    Google Scholar 

  • Manzagol, P.-A., Bertin-Mahieux, T., Eck, D.: On the use of sparse time-relative auditory codes for music. In: Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR 2008), pp. 603–608 (2008)

    Google Scholar 

  • Olshausen, B., Field, D.J.: How close are we to understanding V1? Neural Computation 17, 1665–1699 (2005)

    Article  MATH  Google Scholar 

  • Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research 37, 3311–3325 (1997)

    Article  Google Scholar 

  • Olshausen, B.A., Anderson, C.H., Van Essen, D.C.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993)

    Google Scholar 

  • Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Ghahramani, Z. (ed.) Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML 2007), pp. 759–766. ACM (2007)

    Google Scholar 

  • Ranzato, M., Hinton, G.H.: Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2010), pp. 2551–2558. IEEE Press (2010)

    Google Scholar 

  • Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 1137–1144. MIT Press (2007a)

    Google Scholar 

  • Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS 2006 (2007b)

    Google Scholar 

  • Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  Google Scholar 

  • Salakhutdinov, R.: Learning deep Boltzmann machines using adaptive MCMC. In: Bottou, L., Littman, M. (eds.) Proceedings of the Twenty-Seventh International Conference on Machine Learning (ICML 2010), vol. 1, pp. 943–950. ACM (2010)

    Google Scholar 

  • Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), vol. 5, pp. 448–455 (2009)

    Google Scholar 

  • Salakhutdinov, R., Larochelle, H.: Efficient learning of deep Boltzmann machines. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), JMLR W&CP, vol. 9, pp. 693–700 (2010)

    Google Scholar 

  • Saul, L., Roweis, S.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4, 119–155 (2002)

    MathSciNet  Google Scholar 

  • Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  • Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Progress in Brain Research, Computational Neuroscience: Theoretical Insights into Brain Function 165, 33–56 (2007)

    Article  Google Scholar 

  • Smith, E.C., Lewicki, M.S.: Efficient auditory coding. Nature 439(7079), 978–982 (2006)

    Article  Google Scholar 

  • Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, ch. 6, vol. 1, pp. 194–281. MIT Press, Cambridge (1986)

    Google Scholar 

  • Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional Learning of Spatio-temporal Features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  • Tenenbaum, J.B., Freeman, W.T.: Separating Style and Content with Bilinear Models. Neural Computation 12(6), 1247–1283 (2000)

    Article  Google Scholar 

  • Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008), pp. 1064–1071. ACM (2008)

    Google Scholar 

  • Tieleman, T., Hinton, G.: Using fast weights to improve persistent contrastive divergence. In: Bottou, L., Littman, M. (eds.) Proceedings of the Twenty-Sixth International Conference on Machine Learning (ICML 2009), pp. 1033–1040. ACM (2009)

    Google Scholar 

  • Turaga, S.C., Murray, J.F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk, W., Seung, H.S.: Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation 22, 511–538 (2010)

    Article  MATH  Google Scholar 

  • Vasilescu, M.A.O., Terzopoulos, D.: Multilinear independent components analysis. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 547–553 (2005)

    Google Scholar 

  • Vincent, P.: A connection between score matching and denoising autoencoders. Neural Computation 23(7), 1661–1674 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008), pp. 1096–1103. ACM (2008)

    Google Scholar 

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  • Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference (BMVC), London, UK, p. 127 (2009)

    Google Scholar 

  • Welling, M.: Herding dynamic weights for partially observed random field models. In: Proceedings of the 25th Conference in Uncertainty in Artificial Intelligence (UAI 2009). Morgan Kaufmann (2009)

    Google Scholar 

  • Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008), pp. 1168–1175. ACM, New York (2008)

    Google Scholar 

  • Wiskott, L., Sejnowski, T.: Slow feature analysis: Unsupervised learning of invariances. Neural Computation 14(4), 715–770 (2002)

    Article  MATH  Google Scholar 

  • Younes, L.: On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports 65(3), 177–228 (1999)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bengio, Y., Courville, A. (2013). Deep Learning of Representations. In: Bianchini, M., Maggini, M., Jain, L. (eds) Handbook on Neural Information Processing. Intelligent Systems Reference Library, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36657-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36657-4_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36656-7

  • Online ISBN: 978-3-642-36657-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics