dblp.uni-trier.de www.dagstuhl.de www.uni-trier.de

Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases.

Yasuhiko Morimoto, Takeshi Fukuda, Hirofumi Matsuzawa, Takeshi Tokuyama, Kunikazu Yoda: Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases. VLDB 1998: 380-391
@inproceedings{DBLP:conf/vldb/MorimotoFMTY98,
  author    = {Yasuhiko Morimoto and
               Takeshi Fukuda and
               Hirofumi Matsuzawa and
               Takeshi Tokuyama and
               Kunikazu Yoda},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Algorithms for Mining Association Rules for Binary Segmentations
               of Huge Categorical Databases},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {380-391},
  ee        = {db/conf/vldb/MorimotoFMTY98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We consider the problem of finding association rules that make nearly optimal binary segmentations of huge categorical databases. The optimality of segmentation is defined by an objective function suitable for the user's objective. An objective function is usually defined in terms of the distribution of agiven target attribute. Our goal is to find association rules that split databases into two subsets, optimizing the value of an objective function.

The problem is intractable for general objective functions, because letting N be the number of records of a given database, there are 2N possible binary segmentations, and we may have to exhaustively examine all of them. However, when the objective function is convex, there are feasible algorithms for finding nearly optimal binary segmentations, and we prove that typical criteria, such as "entropy (mutual information)," "x2 (correlation)," and "gini index (mean squared error)," are actually convex.

We propose practical algorithms that use computational geometry techniquesto handle cases where a target attribute is not binary, in which conventional approaches cannot be used directly.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[AIS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AT94]
...
[BEHW89]
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, Manfred K. Warmuth: Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36(4): 929-965(1989) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Cen92]
...
[DEY86]
David P. Dobkin, Herbert Edelsbrunner, Chee-Keng Yap: Probing Convex Polytopes. STOC 1986: 424-432 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FMMT96a]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FMMT96b]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FMMT96c]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HF95]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HII95]
Susumu Hasegawa, Hiroshi Imai, Masaki Ishiguro: epsilon-Approximations of k-Label Spaces. Theor. Comput. Sci. 137(1): 145-157(1995) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HW87]
...
[MFMT97]
...
[MP91]
...
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PS85]
Franco P. Preparata, Michael Ian Shamos: Computational Geometry - An Introduction. Springer 1985, ISBN 3-540-96131-3
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PS91]
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases 1991: 229-248 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Qui86]
J. Ross Quinlan: Induction of Decision Trees. Machine Learning 1(1): 81-106(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Referenced by

  1. Shinichi Morishita, Jun Sese: Traversing Itemset Lattice with Statistical Metric Pruning. PODS 2000: 226-236
  2. Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-Yin Loh: BOAT-Optimistic Decision Tree Construction. SIGMOD Conference 1999: 169-180
  3. Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti: RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. VLDB 1998: 416-427

Last update Fri Sep 14 17:38:23 2012 CET by the DBLP TeamThis material is Open Data Data released under the ODC-BY 1.0 license — See also our legal information page