## Market-basket problem

October 18, 2014 Leave a comment

This is the general market-basket problem. It is an algorithm to find how many items are frequently found across many shoppers’ baskets based on a threshold. The threshold is a minimum number of occurrences of a particular item. Items that are bought a certain number of times(threshold) are considered frequent.

These items can be singletons or pairs of items(doubletons) and tripletons and so on.

`Imagine there are 100 baskets, numbered 1,2,...,100, and 100 items, similarly numbered. Item i is in basket j if and only if i divides j evenly. For example, basket 24 is the set of items {1,2,3,4,6,8,12,24}. Describe all the association rules that have 100% confidence. Which of the following rules has 100% confidence?`

A brute-force *R* approach to solve such a problem. This is a small number of items. In fact such data mining algorithms deal with large quantities of data and a fixed amount of memory. One such algorithm is the A-priori algorithm.

Each of the *if* loop checks for a condition like this.

{8,10} -> 20

This checks if item *20* is always found in a basket that has items *8* and *10* or not.

library(Hmisc) for( i in 1:100){ a <- 1 for( j in 1:100){ if( i %% j == 0 ){ a <- append(a,j) } } #print(paste( i, a )) if( 8 %in% a && 10 %in% a && 20 %nin% a ){ //{8,10} -> 20 #print (a) } if( 3 %in% a && 1 %in% a && 6 %in% a && 12 %nin% a ){ print (a) } if( 8 %in% a && 12 %in% a && 96 %nin% a ){ #print (a) } if( 3 %in% a && 5 %in% a && 1 %nin% a ){ #print (a) } }}