Skip to main content
Log in

Semi-supervised learning by disagreement

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many real-world tasks, there are abundant unlabeled examples but the number of labeled training examples is limited, because labeling the examples requires human efforts and expertise. So, semi-supervised learning which tries to exploit unlabeled examples to improve learning performance has become a hot topic. Disagreement-based semi-supervised learning is an interesting paradigm, where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi-supervised learning process. This survey article provides an introduction to research advances in this paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of the 15th international conference on machine learning. Madison, WI, pp 1–9

  2. Abney S (2002) Bootstrapping. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 360–367

  3. Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden markov support vector machines. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 3–10

  4. Amini MR, Gallinari P: Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 8(4), 385–413 (2005)

    Article  Google Scholar 

  5. Angluin D, Laird P: Learning from noisy examples. Mach Learn 2(4), 343–370 (1988)

    Google Scholar 

  6. Balcan M-F, Blum A, Yang K (2005) Co-training and expansion: towards bridging theory and practice. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, pp 89–96

    Google Scholar 

  7. Belkin M, Niyogi P: Semi-supervised learning on Riemannian manifolds. Mach Learn 56(1–3), 209–239 (2004)

    Article  MATH  Google Scholar 

  8. Belkin M, Niyogi P, Sindhwani V (2005) On manifold regularization. In: Proceedings of the 10th international workshop on artificial intelligence and statistics. Savannah, Barbados, pp 17–24

  9. Belkin M, Niyogi P, Sindhwani V: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7, 2399–2434 (2006)

    MathSciNet  Google Scholar 

  10. Bickel S, Scheffer T (2005) Estimation of mixture models using co-EM. In: Proceedings of the 16th European conference on machine learning. Porto, Portugal, pp 35–46

  11. Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th international conference on machine learning. Williamston, MA, pp 19–26

  12. Blum A, Lafferty J, Rwebangira M, Reddy R (2004) Semi-supervised learning using randomized mincuts. In: Proceedings of the 21st international conference on machine learning. Banff, Canada, pp 13–20

  13. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory. Madison, WI, pp 92–100

  14. Brefeld U, Büscher C, Scheffer T (2005) Multi-view hidden markov perceptrons. In: Proceedings of the GI workshops. Saarbrücken, Germany, pp 134–138

  15. Brefeld U, Scheffer T (2004) Co-EM support vector learning. In: Proceedings of the 21st international conference on machine learning. Banff, Canada

  16. Brefeld U, Scheffer T (2006) Semi-supervised learning for structured output variables. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 145–152

  17. Breiman L: Bagging predictors. Mach Learn 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  18. Breiman L: Random forests. Mach Learn 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  19. Carreira-Perpinan MA, Zemel RS (2005) Proximity graphs for clustering and manifold learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge

    Google Scholar 

  20. Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised SVMs. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 185–192

  21. Chapelle, O, Schölkopf, B, Zien, A (eds): Semi-supervised learning. MIT Press, Cambridge (2006)

    Google Scholar 

  22. Chapelle O, Zien A (2005) Semi-supervised learning by low density separation. In: Proceedings of the 10th international workshop on artificial intelligence and statistics. Savannah Hotel, Barbados, pp 57–64

  23. Cohen I, Cozman FG, Sebe N, Cirelo MC, Huang TS: Semisupervised learning of classifiers: theory, algorithm, and their application to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12), 1553–1567 (2004)

    Article  Google Scholar 

  24. Collins M, Singer Y (1999) Unsupervised models for named entity classifications. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora. College Park, MD, pp 100–110

  25. Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 201–208

  26. Cozman FG, Cohen I (2002) Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the 15th international conference of the Florida Artificial Intelligence Research Society. Pensacola, FL, pp 327–331

  27. Dasarathy BV: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)

    Google Scholar 

  28. Dasgupta S, Littman M, McAllester D (2002) PAC generalization bounds for co-training. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 375–382

    Google Scholar 

  29. Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  30. Dong A, Bhanu B (2003) A new semi-supervised EM algorithm for image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition. Madison, WI, pp 662–667

  31. Efron B, Tibshirani R: An introduction to the bootstrap. Chapman & Hall, New York (1993)

    MATH  Google Scholar 

  32. Farquhar JDR, Hardoon D, Meng H, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. In: Weiss Y, Schölkopf B, Platt J (eds). Advances in neural information processing systems 18. MIT Press, Cambridge MA, pp. 355–362

    Google Scholar 

  33. Fujino A, Ueda N, Saito K (2005) A hybrid generative/discriminative approach to semi-supervised classifier design. In: Proceedings of the 20th national conference on artificial intelligence. Pittsburgh, PA, pp 764–769

  34. Garcke J, Griebel M (2005) Semi-supervised learning with sparse grids. In: Working Notes of the ICML’05 Workshop on learning with partially classified training data. Bonn, Germany

  35. Goldberg AB, Li M, Zhu X (2008) Online manifold regularization: a new learning setting and empirical study. In: Proceedings of the 19th European conference on machine learning. Antwerp, Belgium, pp 393–407

  36. Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th international conference on machine learning. San Francisco, CA, pp 327–334

  37. Grandvalet Y, Bengio Y (2005) Semi-supervised learning by entropy minimization. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, MA, pp 529–536

    Google Scholar 

  38. Hardoon DR, Szedmak S, Shawe-Taylor J: Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12), 2639–2664 (2004)

    Article  MATH  Google Scholar 

  39. Hein M, Maier M (2007) Manifold denoising. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 561–568

    Google Scholar 

  40. Hosmer W: A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics 29(4), 761–770 (1973)

    Article  Google Scholar 

  41. Hwa R, Osborne M, Sarkar A, Steedman M (2003) Corrected co-training for statistical parsers. In: Working notes of the ICML’03 Workshop on the continuum from labeled to unlabeled data in machine learning and data mining. Washington, DC

  42. Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning. Bled, Slovenia, pp 200–209

  43. Kockelkorn M, Lüneburg A, Scheffer T (2003) Using transduction and multi-view learning to answer emails. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases. Cavtat-Dubrovnik, Croatia, pp 266–277

  44. Lawrence ND, Jordan MI (2005) Semi-supervised learning via Gaussian processes. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, MA, pp 753–760

    Google Scholar 

  45. Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Dublin, Ireland, pp 3–12

  46. Li M, Li H, Zhou Z-H: Semi-supervised document retrieval. Inf. Process. Manage. 45(3), 341–355 (2009)

    Article  Google Scholar 

  47. Li M, Zhou Z-H (2005) SETRED: Self-training with editing. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining. Hanoi, Vietnam, pp 611–621

  48. Li M, Zhou Z-H: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern Part A Syst Humans 37(6), 1088–1098 (2007)

    Article  Google Scholar 

  49. Li T, Ogihara M: Semisupervised learning from different information sources. Knowl Inf Syst 7(3), 289–309 (2005)

    Article  Google Scholar 

  50. Lippmann RP: Pattern classification using neural networks. IEEE Commun 27(11), 47–64 (1989)

    Article  Google Scholar 

  51. Mavroeidis D, Chaidos K, Pirillos S, Christopoulos D, Vazirgiannis M (2006) Using tri-training and support vector machines for addressing the ECML-PKDD 2006 discovery challenge. In: Proceedings of ECML-PKDD 2006 discovery challenge workshop. Berlin, Germany, pp 39–47

  52. McLachlan J: Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J Am Stat Assoc 70(350), 365–369 (1977)

    Article  MathSciNet  Google Scholar 

  53. McLachlan J, Ganesalingam S: Updating a discriminant function on the basis of unclassified data. Commun Stat Simul Comput 11(6), 753–767 (1982)

    Article  MATH  Google Scholar 

  54. Miller DJ, Uyar HS (1997) A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Mozer M, Jordan MI, Petsche T (eds) Advances in neural information processing systems 9. MIT Press, Cambridge, MA, pp 571–577

    Google Scholar 

  55. Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th ACM international conference on information and knowledge management. Washington, DC, pp 86–93

  56. Nigam K, McCallum AK, Thrun S, Mitchell T: Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2-3), 103–134 (2000)

    Article  MATH  Google Scholar 

  57. O’Neill T: Normal discrimination with unclassified observations. J Am Stat Assoc 73(364), 821–826 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  58. Pierce D, Cardie C (2001) Limitations of co-training for natural language learning from large data sets. In: Proceedings of the 2001 conference on empirical methods in natural language processing. Pittsburgh, PA, pp 1–9

  59. Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th national conference on artificial intelligence. Orlando, FL, pp 474–479

  60. Roweis ST, Saul LK: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  61. Rui Y, Huang TS, Ortega M, Mehrotra S: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5), 644–655 (1998)

    Article  Google Scholar 

  62. Sarkar A (2001) Applying co-training methods to statistical parsing. In: Proceedings of the 2nd annual meeting of the North American chapter of the association for computational linguistics. Pittsburgh, PA, pp 95–102

  63. Seung H, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th ACM workshop on computational learning theory. Pittsburgh, PA, pp 287–294

  64. Shahshahani B, Landgrebe D: The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Trans Geosci Remote Sens 32(5), 1087–1095 (1994)

    Article  Google Scholar 

  65. Sindhwani V, Keerthi SS (2006) Large scale semi-supervised linear SVMs. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. Seattle, WA, pp 477–484

  66. Sindhwani V, Keerthi SS, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 123–130

  67. Sindhwani V, Niyogi P, Belkin M (2005) Beyond the point cloud: From transductive to semi-supervised learning. In: Proceedings of the 22nd international conference on machine learning. Bonn, Germany, pp 824–831

  68. Steedman M, Osborne M, Sarkar A, Clark S, Hwa R, Hockenmaier J, Ruhlen P, Baker S, Crim J (2003) Bootstrapping statistical parsers from small data sets. In: Proceedings of the 11th conference on the European chapter of the association for computational linguistics. Budapest, Hungary, pp 331–338

  69. Vapnik VN: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  70. Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 985–992

  71. Wang W, Zhou Z-H (2007) Analyzing co-training style algorithms. In: Proceedings of the 18th European conference on machine learning. Warsaw, Poland, pp 454–465

  72. Wang W, Zhou Z-H (2008) On multi-view active learning and the combination with semi-supervised learning. In: Proceedings of the 25th international conference on machine learning. Helsinki, Finland, pp 1152–1159

  73. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D: Top 10 algorithms in data mining. Knowledge and Information Systems 14(1), 1–37 (2008)

    Article  Google Scholar 

  74. Wu Y, Tian Q, Huang TS (2000) Discriminant-EM algorithm with application to image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition. Hilton Head, SC, pp 222–227

  75. Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics. Cambridge, MA, pp 189–196

  76. Yu K, Yu S, Tresp V (2005) Blockwise supervised inference on large graphs. In: Working notes of the ICML’05 workshop on learning with partially classified training data. Bonn, Germany

  77. Yuille AL, Rangarajan A (2002) The concave-convex procedure (CCCP). In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 1033–1040

    Google Scholar 

  78. Zhang T, Oles FJ (2000) A probability analysis on the value of unlabeled data for classification problems. In: Proceedings of 17th international conference on machine learning. Stanford, CA, pp 1191–1198

  79. Zhang X, Lee WS (2007) Hyperparameter learning for graph based semi-supervised learning algorithms. In: Schölkopf B, Platt J, Hofmann T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1585–1592

    Google Scholar 

  80. Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge

    Google Scholar 

  81. Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. Boca Raton, FL, pp 594–602

  82. Zhou Z-H (2006) Learning with unlabeled data and its application to image retrieval. In: Proceedings of the 9th Pacific rim international conference on artificial intelligence. Guilin, China, pp 5–10

  83. Zhou Z-H (2008) Semi-supervised learning by disagreement. In: Proceedings of the 4th IEEE international conference on granular computing. Hangzhou, China

  84. Zhou Z-H: Ensemble learning. In: Li, SZ (eds) Encyclopedia of biometrics, Springer, Berlin (2009)

    Google Scholar 

  85. Zhou Z-H, Chen K-J, Dai H-B: Enhancing relevance feedback in image retrieval using unlabeled data. ACM Trans Inf Syst 24(2), 219–244 (2006)

    Article  Google Scholar 

  86. Zhou Z-H, Chen K-J, Jiang Y (2004) Exploiting unlabeled data in content-based image retrieval. In: Proceedings of the 15th European conference on machine learning. Pisa, Italy, pp 525–536

  87. Zhou Z-H, Li M (2005) Semi-supervised regression with co-training. In: Proceedings of the 19th international joint conference on artificial intelligence. Edinburgh, Scotland, pp 908–913

  88. Zhou Z-H, Li M: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  89. Zhou Z-H, Li M: Semi-supervised regression with co-training style algorithms. IEEE Trans Knowl Data Eng 19(11), 1479–1493 (2007)

    Article  Google Scholar 

  90. Zhou Z-H, Ng M, She Q-Q, Jiang Y (2009) Budget semi-supervised learning. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining. Bangkok, Thailand, pp 588–595

  91. Zhou Z-H, Zhan D-C, Yang Q (2007) Semi-supervised learning with very few labeled training examples. In: Proceedings of the 22nd AAAI conference on artificial intelligence. Vancouver, Canada, pp 675–680

  92. Zhu X (2006) Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI, http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf

  93. Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 912–919

  94. Zhu X, Lafferty J (2005) Harmonic mixtures: Combining mixture models and graph-based methods for inductive and scalable semi-supervised leanring. In: Proceedings of the 22nd international conference on machine learning. Bonn, Germany, pp 1052–1059

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, ZH., Li, M. Semi-supervised learning by disagreement. Knowl Inf Syst 24, 415–439 (2010). https://doi.org/10.1007/s10115-009-0209-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0209-z

Keywords

Navigation