Abstract
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S \( \subseteq \) L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '93).Washington, D.C., pp. 207–216.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, (Eds.), Menlo Park, CA: AAAI Press. pp. 307–328.
Bell, S. 1995. Discovery and maintenance of functional dependencies by independencies. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 27–32.
Berge, C. 1973. Hypergraphs. Combinatorics of Finite Sets (third edition). Amsterdam: North-Holland.
Brin, S., Motwani, R., and Silverstein, S. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '97). Tucson, AZ., pp. 265–276.
Chang, C.C. and Keisler, H.J. 1973. Model Theory (third edition, 1990). Amsterdam: North-Holland.
De Raedt, L. and Bruynooghe, M. 1993. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93). Chambéry, France, pp. 1058–1053.
De Raedt, L. and Džeroski, S. 1994. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392.
Eiter, T. and Gottlob, G. 1995. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304.
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (Eds.), 1996. Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press.
Fredman, M.L. and Khachiyan, L. 1996. On the complexity of dualization of monotone disjunctive normal forms. Journal of Algorithms, 21(3):618–628.
Fukuda, T., Morimoto, Y., Morishita, S., and Tokuyama, T. 1996. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '96). Montréal, Canada, pp. 182–191.
Gunopulos, D., Khardon, R., Mannila, H., and Toivonen, H. 1997. Data mining, hypergraph transversals, and machine learning. In Proceedings of the SixteenthACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97). Tucson, AZ., pp. 209–216.
Gurvich, V. and Khachiyan, L. 1995. On generating the irredundant conjunctive and disjunctive normal forms of monotone boolean functions. Technical Report LCSR-TR-251, Rutgers University.
Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 420–431.
Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 150–155.
Houtsma, M. and Swami, A. 1993. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, CA.
Imielinski, T. and Mannila, H. 1996. A database perspective on knowledge discovery. Communications of the ACM, 39(11):58–64.
Kantola, M., Mannila, H., Räihä, K.-J., and Siirtola, H. 1992. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607.
Kietz, J.-U. and Wrobel, S. 1992. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton ( Ed.), Inductive Logic Programming. London: Academic Press, pp. 335–359.
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A.I. 1994. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Management (CIKM '94). Gaithersburg, MD, pp. 401–407.
Kloesgen, W. 1995. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69.
Knobbe, A.J. and Adriaans, P.W. 1996. Discovering foreign key relations in relational databases. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research. Vienna, Austria, pp. 961–966.
Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.
Mannila, H. and Räihä, K.-J. 1986. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141.
Mannila, H. and Räihä, K.-J. 1992a. Design of Relational Databases. Wokingham, UK: Addison-Wesley.
Mannila, H. and Räihä, K.-J. 1992b. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243.
Mannila, H. and Räihä, K.-J. 1994. Algorithms for inferring functional dependencies. Data and Knowledge Engineering, 12(1):83–99.
Mannila, H., Toivonen, H., and Verkamo, A.I. 1995. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 210–215.
Mannila, H., Toivonen, H., and Verkamo, A.I. 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289.
Mishra, N. and Pitt, L. 1995. On bounded-degree hypergraph transversal. Manuscript.
Mitchell, T.M. 1982. Generalization as search. Artificial Intelligence, 18:203–226.
Park, J.S., Chen, M.-S., and Yu, P.S. 1995. An effective hash-based algorithm for mining association rules. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '95). San Jose, CA, pp. 175–186.
Pfahringer, B. and Kramer, S. 1995. Compression-based evaluation of partial determinations. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 234–239.
Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley (Eds.), Menlo Park, CA: AAAI Press, pp. 229–248.
Piatetsky-Shapiro, G. and Frawley, W.J. (Eds.), 1991. Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.
Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 432–444.
Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 407–419.
Srikant, R. and Agrawal, R. 1996. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '96). Montréal, Canada, pp. 1–12.
Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). Mumbay, India, pp. 134–145.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mannila, H., Toivonen, H. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery 1, 241–258 (1997). https://doi.org/10.1023/A:1009796218281
Issue Date:
DOI: https://doi.org/10.1023/A:1009796218281