Abstract
In nonstationary environments, high-dimensional data streams have been generated unceasingly where the underlying distribution of the training and target data may change over time. These drifts are labeled as concept drift in the literature. Learning from evolving data streams demands adaptive or evolving approaches to handle concept drifts, which is a brand-new research affair. In this effort, a wide-ranging comparative analysis of concept drift is represented to highlight state-of-the-art approaches, embracing the last four decades, namely from 1980 to 2020. Considering the scope and discipline; the core collection of the Web of Science database is regarded as the basis of this study, and 1,564 publications related to concept drift are retrieved. As a result of the classification and feature analysis of valid literature data, the bibliometric indicators are revealed at the levels of countries/regions, institutions, and authors. The overall analyses, respecting the publications, citations, and cooperation of networks, are unveiled not only the highly authoritative publications but also the most prolific institutions, influential authors, dynamic networks, etc. Furthermore, deep analyses including text mining such as; the burst detection analysis, co-occurrence analysis, timeline view analysis, and bibliographic coupling analysis are conducted to disclose the current challenges and future research directions. This paper contributes as a remarkable reference for invaluable further research of concept drift, which enlightens the emerging/trend topics, and the possible research directions with several graphs, visualized by using the VOS viewer and Cite Space software.
Similar content being viewed by others
References
Abdallah ZS, Gaber MM, Srinivasan B, Krishnaswamy S (2016) AnyNovel: detection of novel concepts in evolving data streams: an application for activity recognition. Evol Syst 7:73–93. https://doi.org/10.1007/s12530-016-9147-7
Abdullatif A, Masulli F, Rovetta S (2018) Clustering of nonstationary data streams: a survey of fuzzy partitional methods. Wiley Interdiscip Rev Data Min Knowl Discov. 8:e1258. https://doi.org/10.1002/widm.1258.
Agrahari S, Singh AK (2021) Concept drift detection in data stream mining : a literature review. J King Saud Univ Comput Inf Sci. 34:9523–9540. https://doi.org/10.1016/j.jksuci.2021.11.006
Alonso S, Cabrerizo FJ, Herrera-Viedma E, Herrera F (2009) h-Index: a review focused in its variants, computation and standardization for different scientific fields. J Informetr. 3:273–289. https://doi.org/10.1016/j.joi.2009.04.001
Anupama N, Jena S (2019) A novel approach using incremental oversampling for data stream mining. Evol Syst 10:351–362. https://doi.org/10.1007/s12530-018-9249-5
Babüroğlu ES, Durmuşoğlu A, Dereli T (2021) Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection. Expert Syst Appl. 163:1137. https://doi.org/10.1016/j.eswa.2020.113786
Baena-Garcia M, Campo-Avila J, Fidalgo R, et al (2006) Early drift detection method. In: 4th ECML PKDD international workshop on knowledge discovery from data streams
Barros RSM, Santos SGTC (2018) A large-scale comparison of concept drift detectors. Inf Sci (n Y). https://doi.org/10.1016/j.ins.2018.04.014
Barros RSM, Cabral DRL, Gonçalves PM, Santos SGTC (2017) RDDM: reactive drift detection method. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2017.08.023
Bayram F, Ahmed BS, Kassler A (2022) From concept drift to model degradation: an overview on performance-aware drift detectors. Knowl Based Syst. 245:108632. https://doi.org/10.1016/j.knosys.2022.108632
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. https://doi.org/10.1137/1.9781611972771.42
Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics)
Bifet A, Holmes G, Pfahringer B, et al (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’09
Bifet A, Hammer B, Schleif FM (2019) Recent trends in streaming data analysis, concept drift and analysis of dynamic data sets. ESANN 2019 - Proceedings, 27th European symposium on artificial neural networks, computational intelligence and machine learning 421–430
Borgman CL, Furner J (2002) Scholarly communication and bibliometrics. Annual Review of Information Science and Technology 36:2–72. https://doi.org/10.1002/aris.1440360102
Chikushi RTM, de Barros RSM, da Silva MGNM, Maciel BIF (2020) Using spectral entropy and bernoulli map to handle concept drift. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.114114
Cobo MJ, López-Herrera AG, Herrera-Viedma E, Herrera F (2011) Science mapping software tools: review, analysis, and cooperative study among tools. J Am Soc Inform Sci Technol. https://doi.org/10.1002/asi.21525
de Barros RSM, de Santos SGTC (2019) An overview and comprehensive comparison of ensembles for concept drift. Inf Fus 52:213–244. https://doi.org/10.1016/j.inffus.2019.03.006
de Cabral DR, de Barros RSM (2018) Concept drift detection based on fisher’s exact test. Inf Sci (n Y). https://doi.org/10.1016/j.ins.2018.02.054
de Barros RSM, Hidalgo JIG, de Cabral DRL (2018) Wilcoxon rank sum test drift detector. Neurocomputing 275:1954–1963. https://doi.org/10.1016/j.neucom.2017.10.051
Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10:12–25. https://doi.org/10.1109/MCI.2015.2471196
Dong F, Zhang G, Lu J, Li K (2018) Fuzzy competence model drift detection for data-driven decision support systems. Knowl Based Syst 143:284–294. https://doi.org/10.1016/j.knosys.2017.08.018
Elwell R, Polikar R (2009) Incremental learning of variable rate concept drift. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 5519 LNCS:142–151. https://doi.org/10.1007/978-3-642-02326-2_15
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531. https://doi.org/10.1109/TNN.2011.2160459
Falagas ME, Pitsouni EI, Malietzis GA, Pappas G (2008) Comparison of pubmed, scopus, web of science, and google scholar: strengths and weaknesses. FASEB J. https://doi.org/10.1096/fj.07-9492lsf
Frías-Blanco I, Del Campo-Ávila J, Ramos-Jiménez G et al (2015) Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2014.2345382
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection Advances in Artificial Intelligence - SBIA 2004, 17th Brazilian Symposium on Artificial Intelligence, São Luis, Maranhão, Brazil 3171: 286–29. https://doi.org/10.1007/978-3-540-28645-5_29
Gama J, Žliobaitė I, Bifet A et al (2014) A survey on concept drift adaptation. ACM Comput Surv. 46:1–37. https://doi.org/10.1145/2523813
Gemaque RN, Costa AFJ, Giusti R, dos Santos EM (2020) An overview of unsupervised drift detection methods. Wiley Interdiscip Rev Data Min Knowl Discov 10:e1381
Giusti L, Carvalho L, Gomes AT et al (2022) Analyzing flight delay prediction under concept drift. Evol Syst. https://doi.org/10.1007/s12530-021-09415-z
Gözüaçık Ö, Can F (2021) Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif Intell Rev 54:3725–3747. https://doi.org/10.1007/s10462-020-09939-x
He X, Wu Y, Yu D, Merigó JM (2017) Exploring the ordered weighted averaging operator knowledge domain: a bibliometric analysis. Int J Intell Syst. https://doi.org/10.1002/int.21894
Henzgen S, Strickert M, Hüllermeier E (2014) Visualization of evolving fuzzy rule-based systems. Evol Syst 5:175–191. https://doi.org/10.1007/s12530-014-9110-4
Heusinger M, Raab C, Schleif FM (2022) Dimensionality reduction in the context of dynamic social media data streams. Evol Syst 13:387–401. https://doi.org/10.1007/s12530-021-09396-z
Hidalgo JIG, Maciel BIF, Barros RSM (2019) Experimenting with prequential variations for data stream learning evaluation. Comput Intell 35:670–692. https://doi.org/10.1111/coin.12208
Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Progress in Artificial Intelligence 1:89–101. https://doi.org/10.1007/s13748-011-0008-0
Hosseini MJ, Ahmadi Z, Beigy H (2013) Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification. Evol Syst 4:43–60. https://doi.org/10.1007/s12530-012-9064-3
Hu H, Kantardzic M, Sethi TS (2019) No free lunch theorem for concept drift detection in streaming data classification : a review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10:e1327. https://doi.org/10.1002/widm.1327
Huang DTJ, Koh YS, Dobbie G, Pears R (2015) Detecting volatility shift in data streams. Proc IEEE Int Conf Data Min. https://doi.org/10.1109/ICDM.2014.50
Idrees MM, Minku LL, Stahl F, Badii A (2020) A heterogeneous online learning ensemble for non-stationary environments. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.104983
Iwashita AS, Papa JP (2019) An overview on concept drift learning. IEEE Access 7:1532–1547. https://doi.org/10.1109/ACCESS.2018.2886026
Jagait RK, Fekri MN, Grolinger K, Mir S (2021) Load forecasting under concept drift: online ensemble learning with recurrent neural network and ARIMA. IEEE Access 9:98992–99008. https://doi.org/10.1109/ACCESS.2021.3095420
Khamassi I, Sayed-Mouchaweh M, Hammami M, Ghédira K (2018) Discussion and review on evolving data streams and concept drift adapting. Evol Syst 9:1–23. https://doi.org/10.1007/s12530-016-9168-2
Kleinberg J, Tardos E (1999) Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. Ann Symp Found Comput Sci Proc 49:14–23. https://doi.org/10.1109/sffcs.1999.814572
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: A new ensemble method for tracking concept drift. In: Proceedings - IEEE international conference on data mining, ICDM. pp 123–130
Krawczyk B, Woźniak M (2015) One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Comput 19:3387–3400. https://doi.org/10.1007/s00500-014-1492-5
Krawczyk B, Minku LL, Gama J et al (2017) Ensemble learning for data stream analysis: a survey. Inf Fus 37:132–156. https://doi.org/10.1016/j.inffus.2017.02.004
Li Z, Huang W, Xiong Y et al (2020) Incremental learning imbalanced data streams with concept drift: the dynamic updated ensemble algorithm. Knowledge-Based Systems 195:105694. https://doi.org/10.1016/j.knosys.2020.105694
Loo HR, Marsono MN (2016) Online network traffic classification with incremental learning. Evol Syst 7:129–143. https://doi.org/10.1007/s12530-016-9152-x
Lu J, Liu A, Dong F et al (2019) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31:2346–2363
Lughofer E, Angelov P (2011) Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Applied Soft Computing 11:2057–2068. https://doi.org/10.1016/j.asoc.2010.07.003
Maciel BIF, Santos SGTC, Barros RSM (2015) A lightweight concept drift detection ensemble. https://doi.org/10.1109/ICTAI.2015.151
Mahdi OA, Pardede E, Ali N, Cao J (2020) Diversity measure as a new drift detection method in data streaming. Knowledge-Based Systems 191: 105227. https://doi.org/10.1016/j.knosys.2019.105227
Merigó JM, Pedrycz W, Weber R, de la Sotta C (2018) Fifty years of information sciences: a bibliometric overview. Inf Sci (n Y). https://doi.org/10.1016/j.ins.2017.11.054
Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22:730–742. https://doi.org/10.1109/TKDE.2009.156
Nordahl C, Boeva V, Grahn H, PerssonNetz M (2022) Evolvecluster: an evolutionary clustering algorithm for streaming data. Evol Syst 13:603–623. https://doi.org/10.1007/s12530-021-09408-y
Pesaranghader A, Viktor HL (2016) Fast Hoeffding drift detection method for evolving data streams. 96–111. https://doi.org/10.1007/978-3-319-46227-1
Pesaranghader A, Viktor HL, Paquet E (2018) McDiarmid drift detection methods for evolving data streams. In: Proceedings of the international joint conference on neural networks
Plamen A, Dimitar PF, Nik K (2010) Evolving Intelligent Systems: Methodology and Applications. Wiley-IEEE Press, United States.
Pratama M, Lu J, Lughofer E et al (2017) An incremental learning of concept drifts using evolving Type-2 recurrent fuzzy neural networks. IEEE Trans Fuzzy Syst 25:1175–1192. https://doi.org/10.1109/TFUZZ.2016.2599855
Pratama M, Pedrycz W, Lughofer E (2018) Evolving ensemble fuzzy classifier. IEEE Trans Fuzzy Syst 26:2552–2567. https://doi.org/10.1109/TFUZZ.2018.2796099
Qiao J, Sun Z, Meng X (2023) Interval type-2 fuzzy neural network based on active semi-supervised learning for non-stationary industrial processes. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3237840
Ren S, Liao B, Zhu W, Li K (2018) Knowledge-maximized ensemble algorithm for different types of concept drift. Inf Sci (n Y) 430–431:261–281. https://doi.org/10.1016/j.ins.2017.11.046
Sakthithasan S, Pears R, Koh YS (2013) One pass concept change detection for data streams. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics)
Santos SGTC, Barros RSM, Gonçalves PM (2019) A differential evolution based method for tuning concept drift detectors in data streams. Inf Sci (n Y) 485:376–393. https://doi.org/10.1016/j.ins.2019.02.031
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn. https://doi.org/10.1023/A:1022810614389
Sidhu P, Bhatia MPS (2019) A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority. Int J Mach Learn Cybern 10:563–578. https://doi.org/10.1007/s13042-017-0738-9
Souto R, de Barros M, Garrido S, Santos TDC (2019) An overview and comprehensive comparison of ensembles for concept drift. Inf Fus 52:213–244. https://doi.org/10.1016/j.inffus.2019.03.006
Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’01 4:377–382. https://doi.org/10.1145/502512.502568
Suárez-Cetrulo AL, Quintana D, Cervantes A (2023) A survey on machine learning for recurring concept drifting data streams. Expert Systems with Applications 213:118934. https://doi.org/10.1016/j.eswa.2022.118934
Synnestvedt MB, Chen C, Holmes JH (2005) CiteSpace II: visualization and knowledge discovery in bibliographic databases. AMIA Annual Symposium proceedings 2005:724–728
van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. https://doi.org/10.1007/s11192-009-0146-3
Wang H, Xu Z, Zeng XJ (2018a) Modeling complex linguistic expressions in qualitative decision making: an overview. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2017.12.030
Wang S, Minku LL, Yao X (2018b) A systematic study of online class imbalance learning with concept drift. IEEE Trans Neural Netw Learn Syst 29:4802–4821. https://doi.org/10.1109/TNNLS.2017.2771290
Wang X, Xu Z, Su SF, Zhou W (2021) A comprehensive bibliometric analysis of uncertain group decision making from 1980 to 2019. Inf Sci (n Y) 547:328–353. https://doi.org/10.1016/j.ins.2020.08.036
Wang S, MacHida F (2021) A robustness evaluation of concept drift detectors against unreliable data streams. 7th IEEE world forum on internet of things, WF-IoT 2021 569–574. Doi: https://doi.org/10.1109/WF-IoT51360.2021.9595202
Wares S, Isaacs J, Elyan E (2019) Data stream mining: methods and challenges for handling concept drift. SN Appl Sci 1:1–19. https://doi.org/10.1007/s42452-019-1433-0
White HD (2018) Pennants for garfield: bibliometrics and document retrieval. Scientometrics. https://doi.org/10.1007/s11192-017-2610-9
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn. https://doi.org/10.1007/BF00116900
Yu D, Xu Z, Pedrycz W, Wang W (2017) Information sciences 1968–2016: a retrospective analysis with text mining and bibliometric. Inf Sci (n Y). https://doi.org/10.1016/j.ins.2017.08.031
Žliobaitė I (2010) Learning under concept drift: an overview. 1–36. https://doi.org/10.1002/sam
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Babüroğlu, E.S., Durmuşoğlu, A. & Dereli, T. Concept drift from 1980 to 2020: a comprehensive bibliometric analysis with future research insight. Evolving Systems (2023). https://doi.org/10.1007/s12530-023-09503-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12530-023-09503-2