Market-basket problem

This is the general market-basket problem. It is an algorithm to find how many items are frequently found across many shoppers’ baskets based on a threshold. The threshold is a minimum number of occurrences of a particular item. Items that are bought a certain number of times(threshold) are considered frequent.

These items can be singletons or pairs of items(doubletons) and tripletons and so on.

Imagine there are 100 baskets, numbered 1,2,...,100, and 100 items, similarly numbered. Item i is in basket j if and only if i divides j evenly. For example, basket 24 is the set of items {1,2,3,4,6,8,12,24}. Describe all the association rules that have 100% confidence. Which of the following rules has 100% confidence?

A brute-force R approach to solve such a problem. This is a small number of items. In fact such data mining algorithms deal with large quantities of data and a fixed amount of memory. One such algorithm is the A-priori algorithm.

Each of the if loop checks for a condition like this.

 {8,10} -> 20

This checks if item 20 is always found in a basket that has items 8 and 10 or not.

library(Hmisc)
for( i in 1:100){
  a <- 1
  for( j in 1:100){

	if( i %% j == 0 ){
		a <- append(a,j)
        }
  }
  #print(paste( i, a ))
  if( 8 %in% a &&  10 %in% a && 20 %nin% a ){ //{8,10} -> 20
	#print (a)
  }
  if( 3 %in% a &&  1 %in% a && 6 %in% a && 12 %nin% a ){
	print (a)
  }
  if( 8 %in% a &&  12 %in% a &&  96 %nin% a ){
	#print (a)
  }
  if( 3 %in% a &&  5 %in% a &&  1 %nin% a ){
	#print (a)
  }
}}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: