sigmoid function

What is a sigmoid function ?

It is http://mathworld.wolfram.com/SigmoidFunction.html

This simple code is the standard way to plot it. I am using Octave.

x = -10:0.1:10;
a = 1.0 ./ (1.0 + exp(-x));
figure;
plot(x,a,'-','linewidth',3)

sigmoid

R wordcloud

A piece of R code to beat the rut of my day job.

I have a document in the directory.

 transcript<-Corpus(DirSource("~/Documents/Algorithms/rcloud/"))
 transcript<-tm_map(transcript,stripWhitespace)
 transcript<-tm_map(transcript,tolower)
 transcript<-tm_map(transcript,stemDocument)
 transcript<-tm_map(transcript,PlainTextDocument)
 wordcloud(transcript,scale=c(5,0,5),max.words=100,random.order=FALSE,rot.per=0.35,use.r.layout=FALSE,colors=brewer.pal(8,"Dark2"))

wordcloud

Cosine distance

I came across this calculation when I was reading about Recommender systems. The last column is the rating given by a particular user for a movie. The other columns of this matrix denote whether a particular actor appeared in the movie or not.

\begin{pmatrix}  1& 0& 1& 0& 1& 2\\  1& 1& 0 & 0& 1& 6\\  0& 1& 0 & 1 & 0 & 2\\  \end{pmatrix}


The first five attributes are Boolean, and the last is an integer "rating." Assume that the scale factor for the rating is α. Compute, as a function of α, the cosine distances between each pair of profiles. For each of α = 0, 0.5, 1, and 2, determine the cosine of the angle between each pair of vectors.

My R code to calculate is this.

# TODO: Add comment
# 
# Author: radhakrishnan
###############################################################################


A = matrix(c(1,0,1,0,1,2,
			 1,1,0,0,1,6,
			 0,1,0,1,0,2),nrow=3,ncol=6,byrow=TRUE)


rownames(A) <- c("A","B","C")

scale1 <- A
scale1[,6] <- A[,6] * 0
print( paste( "A and B is ", ( sum(scale1[1,] * scale1[2,]) )/( sqrt( sum(scale1[1,]^2) ) * sqrt( sum(scale1[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale1[2,] * scale1[3,]) )/( sqrt( sum(scale1[2,]^2) ) * sqrt( sum(scale1[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale1[1,] * scale1[3,]) )/( sqrt( sum(scale1[1,]^2) ) * sqrt( sum(scale1[3,]^2) ) )) )



scale2 <- A
scale2[,6] <- A[,6] * 0.5
print( paste( "A and B is ", ( sum(scale2[1,] * scale2[2,]) )/( sqrt( sum(scale2[1,]^2) ) * sqrt( sum(scale2[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale2[2,] * scale2[3,]) )/( sqrt( sum(scale2[2,]^2) ) * sqrt( sum(scale2[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale2[1,] * scale2[3,]) )/( sqrt( sum(scale2[1,]^2) ) * sqrt( sum(scale2[3,]^2) ) )) )

scale3 <- A
scale3[,6] <- A[,6] * 1
print( paste( "A and B is ", ( sum(scale3[1,] * scale3[2,]) )/( sqrt( sum(scale3[1,]^2) ) * sqrt( sum(scale3[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale3[2,] * scale3[3,]) )/( sqrt( sum(scale3[2,]^2) ) * sqrt( sum(scale3[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale3[1,] * scale1[3,]) )/( sqrt( sum(scale3[1,]^2) ) * sqrt( sum(scale3[3,]^2) ) )) )

scale4 <- A
scale4[,6] <- A[,6] * 2
print( paste( "A and B is ", ( sum(scale4[1,] * scale4[2,]) )/( sqrt( sum(scale4[1,]^2) ) * sqrt( sum(scale4[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale4[2,] * scale4[3,]) )/( sqrt( sum(scale4[2,]^2) ) * sqrt( sum(scale4[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale4[1,] * scale4[3,]) )/( sqrt( sum(scale4[1,]^2) ) * sqrt( sum(scale4[3,]^2) ) )) )

> source(“/Users/radhakrishnan/Documents/eclipse/workspace/MMDS/cosinedistance.R”, echo=FALSE, encoding=”UTF-8″)
[1] “A and B is 0.666666666666667”
[1] “B and C is 0.408248290463863”
[1] “A and C is 0”
[1] “A and B is 0.721687836487032”
[1] “B and C is 0.666666666666667”
[1] “A and C is 0.288675134594813”
[1] “A and B is 0.847318545736323”
[1] “B and C is 0.849836585598797”
[1] “A and C is 0”
[1] “A and B is 0.946094540760746”
[1] “B and C is 0.95257934441568”
[1] “A and C is 0.8651809126974”

Gram-schmidt


Matrix M has three rows and three columns, and the columns form an orthonormal basis. One of the columns is [2/7,3/7,6/7], and another is [6/7, 2/7, -3/7]. Let the third column be [x,y,z]. Since the length of the vector [x,y,z] must be 1, there is a constraint that x2+y2+z2 = 1. However, there are other constraints, and these other constraints can be used to deduce facts about the ratios among x, y, and z. Compute these ratios, and then identify one of them in the list below.

I viewed the Khan academy course.

But the credit for the Matlab code goes to Vladd. I didn’t follow his explanation but the Khan academy course helped.I used the Matlab online compiler to test Vladd’s code and ported it to R.

I can’t believe the for and if loops in the R code took a full day to debug.


# TODO: Add comment
# 
# Author: radhakrishnan
###############################################################################


        A =  matrix(c(2/7,3/7,6/7,6/7,2/7,-3/7,1,2,3),ncol=3,nrow=3)
		r = dim( A)[[1]];
		c = dim( A)[[2]];
		print(A)
		
		Q = matrix(c(0,0,0,0,0,0,0,0,0),ncol=3,nrow=3)
		for (j in 1:3){

			u = matrix(A[ ,j  ]);
	
			if( j - 1 != 0 ){
				for(i in 1:(j - 1)){
				    e = Q[,i]
					a = as.matrix(A[,j])
					p = (t(e) %*% a) / (t(e) %*% e) * e;
					u = u - p
		        }
			}
           # normalize it to length of 1 and store it
			Q[,j] = u / sqrt(u[1,1]^2 + u[2,1]^2 + u[3,1]^2);
			print(Q)
		}

The result is this. The last column is what I want and that satisfies all constraints.

Orthonormality

\begin{pmatrix}  0.2857143& 0.8571429& -0.4285714\\  0.4285714& -0.2857143& 0.8571429\\  0.8571429& -0.4285714& -0.2857143\\  \end{pmatrix}

I will check-in the R code into my Git repository.

Market-basket problem

This is the general market-basket problem. It is an algorithm to find how many items are frequently found across many shoppers’ baskets based on a threshold. The threshold is a minimum number of occurrences of a particular item. Items that are bought a certain number of times(threshold) are considered frequent.

These items can be singletons or pairs of items(doubletons) and tripletons and so on.

Imagine there are 100 baskets, numbered 1,2,...,100, and 100 items, similarly numbered. Item i is in basket j if and only if i divides j evenly. For example, basket 24 is the set of items {1,2,3,4,6,8,12,24}. Describe all the association rules that have 100% confidence. Which of the following rules has 100% confidence?

A brute-force R approach to solve such a problem. This is a small number of items. In fact such data mining algorithms deal with large quantities of data and a fixed amount of memory. One such algorithm is the A-priori algorithm.

Each of the if loop checks for a condition like this.

 {8,10} -> 20

This checks if item 20 is always found in a basket that has items 8 and 10 or not.

library(Hmisc)
for( i in 1:100){
  a <- 1
  for( j in 1:100){

	if( i %% j == 0 ){
		a <- append(a,j)
        }
  }
  #print(paste( i, a ))
  if( 8 %in% a &&  10 %in% a && 20 %nin% a ){ //{8,10} -> 20
	#print (a)
  }
  if( 3 %in% a &&  1 %in% a && 6 %in% a && 12 %nin% a ){
	print (a)
  }
  if( 8 %in% a &&  12 %in% a &&  96 %nin% a ){
	#print (a)
  }
  if( 3 %in% a &&  5 %in% a &&  1 %nin% a ){
	#print (a)
  }
}}

PageRank

Screen Shot 2014-10-04 at 11.33.06 PM

Suppose we compute PageRank with a β of 0.7, and we introduce the additional constraint that the sum of the PageRanks of the three pages must be 3, to handle the problem that otherwise any multiple of a solution will also be a solution. Compute the PageRanks a, b, and c of the three pages A, B, and C, respectively.

A = \sum_{i\rightarrow{j}} \beta r_i/d_i + (1 - \beta ) e / n

My R code is this.

M = matrix(c(0,1/2,1/2,0,0,1,0,0,1),ncol=3)
e = matrix(c(1,1,1),ncol=1)
v1 = matrix(c(1,1,1),ncol=1) 
v1 = v1 / 3
for( i in 1:5){

  v1 =  ((0.7 * M ) %*% v1 ) + (((1 - 0.7 ) * e ) /3 )
}
   v1 = v1 * 3
      [,1]
[1,] 0.300
[2,] 0.405
[3,] 2.295

Apache Mahout

I followed this tutorial. Mahout seems to be an easy way to test Machine Learning algorithms using the Java API.

But I would use this  this R code instead of the one shown in the tutorial to convert the MovieLens dataset to CSV format.

r<-file("u.data","r")
w<-file("u1.csv","w")

while( length(data <- readLines(r)) > 0 ){
	writeLines(gsub("\\s+",",",data),w)
}