Review

Data mining is fashionable nowaydays (or is it already "was"?). Yet I did not know much about it. I felt left out from most of these fascinating coffee break conversations. This had to stop. This often cited article seemed like a good start. I was not disappointed. Thus, today's recommendation: if you want to understand something about association rules, read it!

The paper takes a good start by motivating the need for association rules through examples involving sausages, mustard and bagels. For instance, an association rule may tell me that 40% of the customers who bought sausages and bagels also bought mustard. Then, it presents a formal model for representing these rules and explains how to discover them from a given database. There are two parts to the discovery process. The first and most difficult consists in extracting so-called "large itemsets", i.e., (roughly) collections of facts that, often enough, are found together in one transaction. The second part splits these sets so as to obtain rules with one or more antecedent facts (e.g., sausages and bagels) and one consequent fact (e.g., mustard), as well as their support in the database (e.g., 40%). Now, the most difficult part being the generation of large itemsets, the remainder of the paper is devoted to finding appropriate solutions to this problem. But I am afraid this is the end of my review and that you will have to read the paper to know more. I recommend that you do so, you'll find non trivial answers very neatly explained. But I have to rush now, need to read something about "portals".


a service of Schloss Dagstuhl - Leibniz Center for Informatics