Researchers in machine learning and statistical analysis had developed many data analysis techniques well before the data mining fever caught on. In the second phase of data mining research, the focus was to ensure that data mining algorithms are scalable, i.e., can deal with large volumes of data. These algorithms assume that the data reside in files. However, for data mining to be widely applicable, tools and techniques for data mining must be well integrated with the data-warehousing infrastructure. Therefore, it is important that implementation of data mining tools be studied in the context of relational backends. This paper by Sarawagi, Thomas and Agrawal is one of the first papers that exemplify such a study. The paper compares several alternative implementations of association rules on traditional as well as Object-Relational SQL engines. The paper considers a fairly comprehensive suite of implementation alternatives that exploit SQL queries, stored procedures/user-defined functions, and "extract and mine" strategies. The paper studies the performance and ease of implementation of the alternatives. The experiments were performed on DB2 UDB Server 5.0 and quantify relative trade-offs of the alternative implementations. However, the paper considers association rule mining as the only knowledge discovery technique. Similar studies of other data mining techniques will help us gain a broader understanding of systems issues in integration of data mining with relational database management systems.
a service of Schloss Dagstuhl - Leibniz Center for Informatics