Skip to main content
Log in

Variational approximations for categorical causal modeling with latent variables

  • Article
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Latent class models in the social and behavioral sciences have remained structurally simple. One reason for this is that inference in statistical models can be computationally difficult. Methods for approximate inference, known as variational approximations, which have been developed in the machine learning, graphical modeling and statistical physics literatures, can be used to alleviate the computational difficulties of inference for latent variable models. The aim of the present article is to set these methods alongside some social and behavioral science literature to which they are relevant, and in particular to consider their potential for “categorical causal modeling”, using latent class analysis. We have collated a number of popular categorical-data models with latent variables and causal structure, typically incorporating a Markovian structure. The efficacy of the approximation methods has been demonstrated through simulations related to an important behavioral science model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ajzen, I. (1991). The theory of planned behavior.Organisational Behavior and Human Decision Processes, 50, 179–211.

    Google Scholar 

  • Amari, S. (1995). Information geometry of the EM and em algorithms for neural networks.Neural Networks, 8, 1379–1408.

    Google Scholar 

  • Bahadur, R.R. (1961). A representation of the joint distribution of responses ton dichotomous items. In H. Solomon (Ed.),Studies in item analysis and prediction (pp. 158–168). Standford, CA: Stanford University Press.

    Google Scholar 

  • Barber, D., & Wiegerinck, W. (1998). Tractable undirected approximations for graphical models. In L. Niklasson, T. Bodén & M. Ziemke (Eds.),Proceedings of the Eighth International Conference on Artificial Neural Networks (pp. 93–98). Skövde, Sweden: Springer.

    Google Scholar 

  • Barber, D., & Wiegerinck, W. (1999). Tractable variational structures for approximating graphical models. In M.S. Kearns, S.A. Solla & D.A. Cohn (Eds.),Advances in Neural Information Processing Systems, (Vol. 11, pp. 183–189). Cambridge, MA: MIT Press.

    Google Scholar 

  • Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains.Annals of Mathematical Statistics, 41, 164–171.

    Google Scholar 

  • Bentler, P.M. (1989). EQS Structural Equations Program Manual. Los Angeles, CA: BMDP Statistical Software.

    Google Scholar 

  • Bishop, C.M., Lawrence, N., Jaakkola, T., & Jordan, M.I. (1998). Approximating posterior distributions in belief networks using mixtures. In M.I. Jordan, M.J. Kearns & S.A. Solla (Eds.),Advances in Neural Information Processing Systems, (Vol. 10, pp. 416–422). Cambridge, MA: MIT Press.

    Google Scholar 

  • Bollen, K.A. (1989).Structural equations with latent variables. New York, NY: John Wiley & Sons.

    Google Scholar 

  • Browne, M.W. (1984). Asymptotically distribution free methods for the analysis of covariance structures.British Journal of Mathematical and Statistical Psychology, 37, 62–83.

    Google Scholar 

  • Byrne, B.M. (1995). One application of structural equation modeling from two perspectives: Exploring the EQS and LISREL strategies. In R. Hoyle (Ed.),Structural equation modeling concepts, issues and applications (pp. 138–161). Thousand Oaks, CA: Sage.

    Google Scholar 

  • Cannings, C., Thompson, E.A., & Skolnick, M.H. (1978). Probability functions on complex pedigrees.Advances in Applied Probability, 10, 26–91.

    Google Scholar 

  • Cooper, G.F. (1990). Computational complexity of probabilistic inference using Bayesian belief networks.Artificial Intelligence, 42, 393–405.

    Google Scholar 

  • Cowell, R. (1999). Intoduction to inference for Bayesian networks. In M.I. Jordan (Ed.)Learning in graphical models (pp. 6–26). Dordrecht, The Netherlands: Kluwer.

    Google Scholar 

  • Dayan, P., Hinton, G.E., Neal, R.M., & Zemel, R.S. (1995). The Helmholtz machine.Neural Computation, 7, 889–904.

    Google Scholar 

  • Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion).Journal of the Royal Statistical Society, Series B, 39, 1–38.

    Google Scholar 

  • Dunmur, A.P., & Titterington, D.M. (1999). Analysis of latent structure models with multidimensional latent variables. In J.W. Kay & D.M. Titterington (Eds.),Statistics and neural networks: Advances at the interface (pp. 165–194). Oxford, U.K.: Oxford University Press.

    Google Scholar 

  • Gershenfeld, N.A. (1999).The nature of mathematical modeling. Cambridge, U.K.: Cambridge University Press.

    Google Scholar 

  • Ghahramani, Z. (1996). Factorial learning and the EM algorithm. In G. Tesauro, D.S. Touretzky, & T.K. Leen (Eds.),Advances in neural information processing systems (Vol. 7, pp. 617–624). Cambridge, MA: MIT Press.

    Google Scholar 

  • Ghahramani, Z., & Jordan, M.I. (1997). Factorial hidden Markov models.Machine Learning, 29, 245–273.

    Google Scholar 

  • Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models.Biometrika, 61, 215–231.

    Google Scholar 

  • Hagenaars, J.A. (1993).Loglinear models with latent variables (Sage university paper series on quantitative applications in the social sciences, No. 07-094). Newbury Park, CA: Sage.

    Google Scholar 

  • Hagenaars, J.A. (1998). Categorical causal modeling: Latent class analysis and directed log-linear models with latent variables.Sociological Methods and Research, 26, 436–486.

    Google Scholar 

  • Hall, P., Humphreys, K., & Titterington, D.M. (2002). On the adequacy of variational lower bound functions for likelihood-based inference in Markovian models with missing values.Journal of the Royal Statistical Society, Series B, 64, 549–564.

    Google Scholar 

  • Humphreys, K., & Titterington, D.M. (1999). The exploration of new methods for learning in binary Boltzmann machines. In D. Heckerman & J. Whittaker (Eds.),Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics (pp. 209–214). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Humphreys, K., & Titterington, D.M. (2000). Improving the mean field approximation in belief networks using Bahadur's reparameterization of the multivariate binary distribution.Neural Processing Letters, 12, 183–197.

    Google Scholar 

  • Jensen, F. (1996).An introduction to Bayesian networks. London, U.K.: UCL Press.

    Google Scholar 

  • Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., & Saul, L.K. (1999). An introduction to variational methods for graphical models. In M.I. Jordan (Ed.),Learning in graphical models (pp. 105–161). Dordrecht, The Netherlands: Kluwer.

    Google Scholar 

  • Jöreskog, K.G. (1979). Statistical estimation of structural models in longitudinal-development investigations. In J.R. Nesselroade & P.B. Baltes (Eds.),Longitudinal research in the study of behavior and development (pp. 303–351). New York, NY: Academic Press.

    Google Scholar 

  • Jöreskog, K.G., & Sörbom, D. (1984). LISREL VI: Analysis of Linear Structural Relationships by the Method of Maximum Likelihood. Chicago, IL: Scientific software.

    Google Scholar 

  • Lange, K., & Elston, R.C. (1975). Extension to pedigree analysis: Likelihood computations for simple and complex pedigrees.Human Heredity, 25, 95–105.

    Google Scholar 

  • Langeheine, R. (1994). Latent variables Markov models. In A. Von Eye & C.C. Clogg (Eds.),Latent variables analysis: Applications for developmental research (pp. 373–395). Beverly Hills, CA: Sage

    Google Scholar 

  • Lauritzen, S.L. (1995). The EM algorithm for graphical association models.Computational Statistics and Data Analysis, 10, 191–200.

    Google Scholar 

  • Lauritzen, S.L. (1996).Graphical models. Oxford, U.K.: Clarendon Press.

    Google Scholar 

  • Lauritzen, S.L., & Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their applications to expert systems (with discussion).Journal of the Royal Statistical Society, Series B, 50, 157–224.

    Google Scholar 

  • Lazarsfeld, P.F., & Henry, N.W. (1968).Latent structure analysis. Boston, MA: Houghton-Mifflin.

    Google Scholar 

  • MacDonald, I.L., & Zucchini, W. (1997).Hidden Markov and other models for discrete-valued time series (Monographs on statistics and applied probability, No. 70). London, U.K.: Chapman and Hall.

    Google Scholar 

  • McArdle, J.J., & Aber, M.S. (1990). Patterns of change within latent structure equation models. In A. von Eye (Ed.),Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 151–224). Boston, MA: Academic Press.

    Google Scholar 

  • McHugh, R.B. (1956). Efficient estimation and local identification in latent class analysis.Psychometrika, 21, 331–347.

    Google Scholar 

  • Neal, R.M., & Hinton, G.E. (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M.I. Jordan (Ed.),Learning in graphical models (pp. 355–368). Cambridge, MA: MIT Press.

    Google Scholar 

  • Ng, A.Y., & Jordan, M.I. (2000). Approximate inference algorithms for two-layer Bayesian networks. In S.A. Solla, T.K. Leen & K.-R. Müller (Eds.),Advances in neural information processing systems (Vol. 12, pp. 533–539). Cambridge, MA: MIT Press.

    Google Scholar 

  • Olsson, U., & Bergman, L.R. (1977). A longitudinal factor model for studying change in ability structure.Multivariate Behavioral Research, 12, 221–241.

    Google Scholar 

  • Opper, M., & Saad, D. (Eds.). (2001).Advanced mean field methods: Theory and practice. Cambridge, MA: MIT Press.

    Google Scholar 

  • Pearl, J. (1988).Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Pearl, J. (1998). Graphs, causality and structural equation models.Sociological Methods and Research, 27, 226–284.

    Google Scholar 

  • Pearl, J. (2000).Causality. Cambridge, U.K.: Cambridge University Press.

    Google Scholar 

  • Peterson, C., & Anderson, J.R. (1987). A mean field theory learning algorithm for neural networks.Complex Systems, 1, 995–1019.

    Google Scholar 

  • Pfeffermann, D., Skinner, C.J., & Humphreys, K. (1998). The estimation of gross flows in the presence of measurement error using auxiliary variables.Journal of the Royal Statistical Society, Series A, 161, 13–32.

    Google Scholar 

  • Rabiner, L.R., & Juang, B.H. (1986). An introduction to hidden Markov models.IEEE ASSP Magazine, 3, 4–16.

    Google Scholar 

  • Reinecke, J. (1997). Testing the theory of planned behavior with latent Markov models. In J. Rost & R. Langeheine (Eds.),Applications of latent trait and latent class models in the social sciences (pp. 398–411). Münster, Germany: Waxmann.

    Google Scholar 

  • Reinecke, J., Schmidt, P., & Ajzen, I. (1996). Application of the theory of planned behavior to adolescents' condom use: A panel study.Journal of Applied Social Psychology, 26, 749–772.

    Google Scholar 

  • Saul, L.K., T. Jaakkola & M.I. Jordan (1996). Mean field theory for sigmoid belief networks.Journal of Artificial Intelligence Research, 4, 61–76.

    Google Scholar 

  • Saul, L.K., & Jordan, M.I. (1995). Boltzmann Chains and Hidden Markov Models. In G. Tesauro, D.S. Touretzky & T.K. Leen (Eds.),Advances in neural information processing systems (Vol. 7, pp. 435–442). Cambridge, MA: MIT Press.

    Google Scholar 

  • Saul, L.K., & Jordan, M.I. (1996). Exploiting tractable substructures in intractable networks. In D.S. Touretzky, M.C. Mozer & M.E. Hasselmo (Eds.),Advances in neural information processing systems (Vol. 8, pp. 486–492). Cambridge, MA: MIT Press.

    Google Scholar 

  • Seung, H. (1995). Annealed theories of learning. In J.-H. Oh, C. Kwon & S. Cho (Eds.),Neural networks: The statistical mechanics perspective, Proceedings of the CTP-PRSRI Joint workshop on theoretical physics. Singapore, Malaysia: World Scientific.

    Google Scholar 

  • Smyth, P. (1997). Clustering sequences with hidden Markov models. In M.C. Mozer, M.I. Jordan, & T. Petsche (Eds.),Advances in neural information processing systems (Vol. 9, pp. 648–654). Cambridge, MA: MIT Press.

    Google Scholar 

  • Smyth, P., Heckerman, D., & Jordan, M.I. (1997). Probability independence networks for hidden Markov probability models.Neural Computation, 9, 227–269.

    Google Scholar 

  • Tisak, J., & Meredith, W. (1990). Longitudinal factor analysis. In A. von Eye (Ed.)Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 125–150). Boston, MA: Academic Press.

    Google Scholar 

  • van de Pol, F., & Langeheine, R. (1990). Mixed Markov latent class models. In C.C. Clogg (Ed.),Sociological methodology (pp. 213–247). Oxford, U.K.: Blackwell.

    Google Scholar 

  • West, S.G., Finch, J.F., & Curran, P.J. (1995). Structural equation models with nonnormal variables. In R. Hoyle (Ed.),Structural equation modeling concepts, issues and applications. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Whittaker, J. (1990).Graphical models in applied multivariate statistics. New York, NY: John Wiley & Sons.

    Google Scholar 

  • Wiegerinck, W., & Barber, D. (1999). Variational belief networks for approximate inference. In La Poutre & van den Herik (Eds.),Proceedings of the Tenth Netherlands/Belgium Conference on Artificial Intelligence (pp. 177–183). Amsterdam, The Netherlands: CWI.

    Google Scholar 

  • Wiggins, L.M. (1955).Mathematical models for the analysis of multi-wave panels. Unpublished doctoral dissertation, Columbia University, New York City, NY.

    Google Scholar 

  • Wiggins, L.M. (1973).Panel Aanalysis: Latent probability models for attitude and behavioral processes. San Francisco, CA: Jossey-Bass/Elsevier.

    Google Scholar 

  • Zhang, J. (1996). The application of the Gibbs-Bogoliubov-Feynman inequality in mean field calculations for Markov random fields.IEEE Transactions on Image Processing, 5, 1208–1214.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Humphreys.

Additional information

Research was supported by a grant from the UK Engineering and Physical Sciences Research Council. The authors would like to thank anonymous reviewers and the Associate Editor for their very helpful comments on earlier versions of the manuscript.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Humphreys, K., Titterington, D.M. Variational approximations for categorical causal modeling with latent variables. Psychometrika 68, 391–412 (2003). https://doi.org/10.1007/BF02294734

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02294734

Key words

Navigation