ABSTRACT
Many real life sequence databases grow incrementally. It is undesirable to mine sequential patterns from scratch each time when a small set of sequences grow, or when some new sequences are added into the database. Incremental algorithm should be developed for sequential pattern mining so that mining can be adapted to incremental database updates. However, it is nontrivial to mine sequential patterns incrementally, especially when the existing sequences grow incrementally because such growth may lead to the generation of many new patterns due to the interactions of the growing subsequences with the original ones. In this study, we develop an efficient algorithm, IncSpan, for incremental mining of sequential patterns, by exploring some interesting properties. Our performance study shows that IncSpan outperforms some previously proposed incremental algorithms as well as a non-incremental one with a wide margin.
- R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE '95), pages 3--14, March 1995. Google ScholarDigital Library
- J. Ayres, J.E. Gehrke, T. Yiu, and J. Flannik. Sequential pattern mining using bitmaps. In Proc. 2002 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD '02), July 2002. Google ScholarDigital Library
- D. Cheung, J. Han, V. Ng, and C. Wong. Maintenance of discovered association rules in large databases: An incremental update technique. In Proc. of the 12th Int. Conf. on Data Engineering (ICDE '96), March 1996. Google ScholarDigital Library
- M. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. 1999 Int. Conf. Very Large Data Bases (VLDB '99), pages 223--234, Sept 1999. Google ScholarDigital Library
- H. Mannila, H. Toivonen, and A.I. Verkamo. Discovering frequent episodes in sequences. In Proc. 1995 Int. Conf. Knowledge Discovery and Data Mining (KDD '95), pages 210--215, Aug 1995.Google Scholar
- F. Masseglia, P. Poncelet, and M. Teisseire. Incremental mining of sequential patterns in large databases. Data Knowl. Eng., 46(1): 97--121, 2003. Google ScholarDigital Library
- S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. In Proc. of the 8th Int. Conf. on Information and Knowledge Management (CIKM '99), Nov 1999. Google ScholarDigital Library
- J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 2001 Int. Conf. Data Engineering (ICDE '01), pages 215--224, April 2001. Google ScholarDigital Library
- A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB '95), Sept 1995. Google ScholarDigital Library
- R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the 5th Int. Conf. on Extending Database Technology (EDBT '96), Mar 1996. Google ScholarDigital Library
- J. Wang and J. Han. Bide: Efficient mining of frequent losed sequences. In Proc. of 2004 Int. Conf. on Data Engineering (ICDE '04), March 2004. Google ScholarDigital Library
- X. Yan, J. Han,and R. Afshar. CloSpan: Mining closed sequential patterns in large datasets. In Proc. 2003 SIAM Int. Conf. on Data Mining (SDM '03), May 2003. Google ScholarDigital Library
- M. Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 40:31--60, 2001. Google ScholarDigital Library
- M. Zhang, B. Kao, D. Cheung, and C. Yip. Efficient algorithms for incremental updates of frequent sequences. In Proc. of Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD '02), May 2002. Google ScholarDigital Library
Index Terms
- IncSpan: incremental mining of sequential patterns in large database
Recommendations
On Incremental High Utility Sequential Pattern Mining
Research Survey and Regular PapersHigh utility sequential pattern (HUSP) mining is an emerging topic in pattern mining, and only a few algorithms have been proposed to address it. In practice, most sequence databases usually grow over time, and it is inefficient for existing algorithms ...
Improvements of incspan: incremental mining of sequential patterns in large database
PAKDD'05: Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data MiningIn reality, sequence databases are updated incrementally. The changes on the database may invalidate some existing sequential patterns and introduce new ones. Instead of recomputing the database each time, the incremental mining algorithms target ...
CanTree: a canonical-order tree for incremental frequent-pattern mining
Since its introduction, frequent-pattern mining has been the subject of numerous studies, including incremental updating. Many existing incremental mining algorithms are Apriori-based, which are not easily adoptable to FP-tree-based frequent-pattern ...
Comments