Abstract
Latent class models in the social and behavioral sciences have remained structurally simple. One reason for this is that inference in statistical models can be computationally difficult. Methods for approximate inference, known as variational approximations, which have been developed in the machine learning, graphical modeling and statistical physics literatures, can be used to alleviate the computational difficulties of inference for latent variable models. The aim of the present article is to set these methods alongside some social and behavioral science literature to which they are relevant, and in particular to consider their potential for “categorical causal modeling”, using latent class analysis. We have collated a number of popular categorical-data models with latent variables and causal structure, typically incorporating a Markovian structure. The efficacy of the approximation methods has been demonstrated through simulations related to an important behavioral science model.
Similar content being viewed by others
References
Ajzen, I. (1991). The theory of planned behavior.Organisational Behavior and Human Decision Processes, 50, 179–211.
Amari, S. (1995). Information geometry of the EM and em algorithms for neural networks.Neural Networks, 8, 1379–1408.
Bahadur, R.R. (1961). A representation of the joint distribution of responses ton dichotomous items. In H. Solomon (Ed.),Studies in item analysis and prediction (pp. 158–168). Standford, CA: Stanford University Press.
Barber, D., & Wiegerinck, W. (1998). Tractable undirected approximations for graphical models. In L. Niklasson, T. Bodén & M. Ziemke (Eds.),Proceedings of the Eighth International Conference on Artificial Neural Networks (pp. 93–98). Skövde, Sweden: Springer.
Barber, D., & Wiegerinck, W. (1999). Tractable variational structures for approximating graphical models. In M.S. Kearns, S.A. Solla & D.A. Cohn (Eds.),Advances in Neural Information Processing Systems, (Vol. 11, pp. 183–189). Cambridge, MA: MIT Press.
Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains.Annals of Mathematical Statistics, 41, 164–171.
Bentler, P.M. (1989). EQS Structural Equations Program Manual. Los Angeles, CA: BMDP Statistical Software.
Bishop, C.M., Lawrence, N., Jaakkola, T., & Jordan, M.I. (1998). Approximating posterior distributions in belief networks using mixtures. In M.I. Jordan, M.J. Kearns & S.A. Solla (Eds.),Advances in Neural Information Processing Systems, (Vol. 10, pp. 416–422). Cambridge, MA: MIT Press.
Bollen, K.A. (1989).Structural equations with latent variables. New York, NY: John Wiley & Sons.
Browne, M.W. (1984). Asymptotically distribution free methods for the analysis of covariance structures.British Journal of Mathematical and Statistical Psychology, 37, 62–83.
Byrne, B.M. (1995). One application of structural equation modeling from two perspectives: Exploring the EQS and LISREL strategies. In R. Hoyle (Ed.),Structural equation modeling concepts, issues and applications (pp. 138–161). Thousand Oaks, CA: Sage.
Cannings, C., Thompson, E.A., & Skolnick, M.H. (1978). Probability functions on complex pedigrees.Advances in Applied Probability, 10, 26–91.
Cooper, G.F. (1990). Computational complexity of probabilistic inference using Bayesian belief networks.Artificial Intelligence, 42, 393–405.
Cowell, R. (1999). Intoduction to inference for Bayesian networks. In M.I. Jordan (Ed.)Learning in graphical models (pp. 6–26). Dordrecht, The Netherlands: Kluwer.
Dayan, P., Hinton, G.E., Neal, R.M., & Zemel, R.S. (1995). The Helmholtz machine.Neural Computation, 7, 889–904.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion).Journal of the Royal Statistical Society, Series B, 39, 1–38.
Dunmur, A.P., & Titterington, D.M. (1999). Analysis of latent structure models with multidimensional latent variables. In J.W. Kay & D.M. Titterington (Eds.),Statistics and neural networks: Advances at the interface (pp. 165–194). Oxford, U.K.: Oxford University Press.
Gershenfeld, N.A. (1999).The nature of mathematical modeling. Cambridge, U.K.: Cambridge University Press.
Ghahramani, Z. (1996). Factorial learning and the EM algorithm. In G. Tesauro, D.S. Touretzky, & T.K. Leen (Eds.),Advances in neural information processing systems (Vol. 7, pp. 617–624). Cambridge, MA: MIT Press.
Ghahramani, Z., & Jordan, M.I. (1997). Factorial hidden Markov models.Machine Learning, 29, 245–273.
Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models.Biometrika, 61, 215–231.
Hagenaars, J.A. (1993).Loglinear models with latent variables (Sage university paper series on quantitative applications in the social sciences, No. 07-094). Newbury Park, CA: Sage.
Hagenaars, J.A. (1998). Categorical causal modeling: Latent class analysis and directed log-linear models with latent variables.Sociological Methods and Research, 26, 436–486.
Hall, P., Humphreys, K., & Titterington, D.M. (2002). On the adequacy of variational lower bound functions for likelihood-based inference in Markovian models with missing values.Journal of the Royal Statistical Society, Series B, 64, 549–564.
Humphreys, K., & Titterington, D.M. (1999). The exploration of new methods for learning in binary Boltzmann machines. In D. Heckerman & J. Whittaker (Eds.),Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics (pp. 209–214). San Francisco, CA: Morgan Kaufmann.
Humphreys, K., & Titterington, D.M. (2000). Improving the mean field approximation in belief networks using Bahadur's reparameterization of the multivariate binary distribution.Neural Processing Letters, 12, 183–197.
Jensen, F. (1996).An introduction to Bayesian networks. London, U.K.: UCL Press.
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., & Saul, L.K. (1999). An introduction to variational methods for graphical models. In M.I. Jordan (Ed.),Learning in graphical models (pp. 105–161). Dordrecht, The Netherlands: Kluwer.
Jöreskog, K.G. (1979). Statistical estimation of structural models in longitudinal-development investigations. In J.R. Nesselroade & P.B. Baltes (Eds.),Longitudinal research in the study of behavior and development (pp. 303–351). New York, NY: Academic Press.
Jöreskog, K.G., & Sörbom, D. (1984). LISREL VI: Analysis of Linear Structural Relationships by the Method of Maximum Likelihood. Chicago, IL: Scientific software.
Lange, K., & Elston, R.C. (1975). Extension to pedigree analysis: Likelihood computations for simple and complex pedigrees.Human Heredity, 25, 95–105.
Langeheine, R. (1994). Latent variables Markov models. In A. Von Eye & C.C. Clogg (Eds.),Latent variables analysis: Applications for developmental research (pp. 373–395). Beverly Hills, CA: Sage
Lauritzen, S.L. (1995). The EM algorithm for graphical association models.Computational Statistics and Data Analysis, 10, 191–200.
Lauritzen, S.L. (1996).Graphical models. Oxford, U.K.: Clarendon Press.
Lauritzen, S.L., & Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their applications to expert systems (with discussion).Journal of the Royal Statistical Society, Series B, 50, 157–224.
Lazarsfeld, P.F., & Henry, N.W. (1968).Latent structure analysis. Boston, MA: Houghton-Mifflin.
MacDonald, I.L., & Zucchini, W. (1997).Hidden Markov and other models for discrete-valued time series (Monographs on statistics and applied probability, No. 70). London, U.K.: Chapman and Hall.
McArdle, J.J., & Aber, M.S. (1990). Patterns of change within latent structure equation models. In A. von Eye (Ed.),Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 151–224). Boston, MA: Academic Press.
McHugh, R.B. (1956). Efficient estimation and local identification in latent class analysis.Psychometrika, 21, 331–347.
Neal, R.M., & Hinton, G.E. (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M.I. Jordan (Ed.),Learning in graphical models (pp. 355–368). Cambridge, MA: MIT Press.
Ng, A.Y., & Jordan, M.I. (2000). Approximate inference algorithms for two-layer Bayesian networks. In S.A. Solla, T.K. Leen & K.-R. Müller (Eds.),Advances in neural information processing systems (Vol. 12, pp. 533–539). Cambridge, MA: MIT Press.
Olsson, U., & Bergman, L.R. (1977). A longitudinal factor model for studying change in ability structure.Multivariate Behavioral Research, 12, 221–241.
Opper, M., & Saad, D. (Eds.). (2001).Advanced mean field methods: Theory and practice. Cambridge, MA: MIT Press.
Pearl, J. (1988).Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.
Pearl, J. (1998). Graphs, causality and structural equation models.Sociological Methods and Research, 27, 226–284.
Pearl, J. (2000).Causality. Cambridge, U.K.: Cambridge University Press.
Peterson, C., & Anderson, J.R. (1987). A mean field theory learning algorithm for neural networks.Complex Systems, 1, 995–1019.
Pfeffermann, D., Skinner, C.J., & Humphreys, K. (1998). The estimation of gross flows in the presence of measurement error using auxiliary variables.Journal of the Royal Statistical Society, Series A, 161, 13–32.
Rabiner, L.R., & Juang, B.H. (1986). An introduction to hidden Markov models.IEEE ASSP Magazine, 3, 4–16.
Reinecke, J. (1997). Testing the theory of planned behavior with latent Markov models. In J. Rost & R. Langeheine (Eds.),Applications of latent trait and latent class models in the social sciences (pp. 398–411). Münster, Germany: Waxmann.
Reinecke, J., Schmidt, P., & Ajzen, I. (1996). Application of the theory of planned behavior to adolescents' condom use: A panel study.Journal of Applied Social Psychology, 26, 749–772.
Saul, L.K., T. Jaakkola & M.I. Jordan (1996). Mean field theory for sigmoid belief networks.Journal of Artificial Intelligence Research, 4, 61–76.
Saul, L.K., & Jordan, M.I. (1995). Boltzmann Chains and Hidden Markov Models. In G. Tesauro, D.S. Touretzky & T.K. Leen (Eds.),Advances in neural information processing systems (Vol. 7, pp. 435–442). Cambridge, MA: MIT Press.
Saul, L.K., & Jordan, M.I. (1996). Exploiting tractable substructures in intractable networks. In D.S. Touretzky, M.C. Mozer & M.E. Hasselmo (Eds.),Advances in neural information processing systems (Vol. 8, pp. 486–492). Cambridge, MA: MIT Press.
Seung, H. (1995). Annealed theories of learning. In J.-H. Oh, C. Kwon & S. Cho (Eds.),Neural networks: The statistical mechanics perspective, Proceedings of the CTP-PRSRI Joint workshop on theoretical physics. Singapore, Malaysia: World Scientific.
Smyth, P. (1997). Clustering sequences with hidden Markov models. In M.C. Mozer, M.I. Jordan, & T. Petsche (Eds.),Advances in neural information processing systems (Vol. 9, pp. 648–654). Cambridge, MA: MIT Press.
Smyth, P., Heckerman, D., & Jordan, M.I. (1997). Probability independence networks for hidden Markov probability models.Neural Computation, 9, 227–269.
Tisak, J., & Meredith, W. (1990). Longitudinal factor analysis. In A. von Eye (Ed.)Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 125–150). Boston, MA: Academic Press.
van de Pol, F., & Langeheine, R. (1990). Mixed Markov latent class models. In C.C. Clogg (Ed.),Sociological methodology (pp. 213–247). Oxford, U.K.: Blackwell.
West, S.G., Finch, J.F., & Curran, P.J. (1995). Structural equation models with nonnormal variables. In R. Hoyle (Ed.),Structural equation modeling concepts, issues and applications. Thousand Oaks, CA: Sage.
Whittaker, J. (1990).Graphical models in applied multivariate statistics. New York, NY: John Wiley & Sons.
Wiegerinck, W., & Barber, D. (1999). Variational belief networks for approximate inference. In La Poutre & van den Herik (Eds.),Proceedings of the Tenth Netherlands/Belgium Conference on Artificial Intelligence (pp. 177–183). Amsterdam, The Netherlands: CWI.
Wiggins, L.M. (1955).Mathematical models for the analysis of multi-wave panels. Unpublished doctoral dissertation, Columbia University, New York City, NY.
Wiggins, L.M. (1973).Panel Aanalysis: Latent probability models for attitude and behavioral processes. San Francisco, CA: Jossey-Bass/Elsevier.
Zhang, J. (1996). The application of the Gibbs-Bogoliubov-Feynman inequality in mean field calculations for Markov random fields.IEEE Transactions on Image Processing, 5, 1208–1214.
Author information
Authors and Affiliations
Corresponding author
Additional information
Research was supported by a grant from the UK Engineering and Physical Sciences Research Council. The authors would like to thank anonymous reviewers and the Associate Editor for their very helpful comments on earlier versions of the manuscript.
Rights and permissions
About this article
Cite this article
Humphreys, K., Titterington, D.M. Variational approximations for categorical causal modeling with latent variables. Psychometrika 68, 391–412 (2003). https://doi.org/10.1007/BF02294734
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02294734