November | 2014 | MindSpace

Cosine distance

November 24, 2014 Leave a comment

I came across this calculation when I was reading about Recommender systems. The last column is the rating given by a particular user for a movie. The other columns of this matrix denote whether a particular actor appeared in the movie or not.

$\begin{pmatrix} 1& 0& 1& 0& 1& 2\\ 1& 1& 0 & 0& 1& 6\\ 0& 1& 0 & 1 & 0 & 2\\ \end{pmatrix}$

The first five attributes are Boolean, and the last is an integer "rating." Assume that the scale factor for the rating is α. Compute, as a function of α, the cosine distances between each pair of profiles. For each of α = 0, 0.5, 1, and 2, determine the cosine of the angle between each pair of vectors.

My R code to calculate is this.

# TODO: Add comment
# 
# Author: radhakrishnan
###############################################################################


A = matrix(c(1,0,1,0,1,2,
			 1,1,0,0,1,6,
			 0,1,0,1,0,2),nrow=3,ncol=6,byrow=TRUE)


rownames(A) <- c("A","B","C")

scale1 <- A
scale1[,6] <- A[,6] * 0
print( paste( "A and B is ", ( sum(scale1[1,] * scale1[2,]) )/( sqrt( sum(scale1[1,]^2) ) * sqrt( sum(scale1[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale1[2,] * scale1[3,]) )/( sqrt( sum(scale1[2,]^2) ) * sqrt( sum(scale1[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale1[1,] * scale1[3,]) )/( sqrt( sum(scale1[1,]^2) ) * sqrt( sum(scale1[3,]^2) ) )) )



scale2 <- A
scale2[,6] <- A[,6] * 0.5
print( paste( "A and B is ", ( sum(scale2[1,] * scale2[2,]) )/( sqrt( sum(scale2[1,]^2) ) * sqrt( sum(scale2[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale2[2,] * scale2[3,]) )/( sqrt( sum(scale2[2,]^2) ) * sqrt( sum(scale2[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale2[1,] * scale2[3,]) )/( sqrt( sum(scale2[1,]^2) ) * sqrt( sum(scale2[3,]^2) ) )) )

scale3 <- A
scale3[,6] <- A[,6] * 1
print( paste( "A and B is ", ( sum(scale3[1,] * scale3[2,]) )/( sqrt( sum(scale3[1,]^2) ) * sqrt( sum(scale3[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale3[2,] * scale3[3,]) )/( sqrt( sum(scale3[2,]^2) ) * sqrt( sum(scale3[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale3[1,] * scale1[3,]) )/( sqrt( sum(scale3[1,]^2) ) * sqrt( sum(scale3[3,]^2) ) )) )

scale4 <- A
scale4[,6] <- A[,6] * 2
print( paste( "A and B is ", ( sum(scale4[1,] * scale4[2,]) )/( sqrt( sum(scale4[1,]^2) ) * sqrt( sum(scale4[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale4[2,] * scale4[3,]) )/( sqrt( sum(scale4[2,]^2) ) * sqrt( sum(scale4[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale4[1,] * scale4[3,]) )/( sqrt( sum(scale4[1,]^2) ) * sqrt( sum(scale4[3,]^2) ) )) )

> source(“/Users/radhakrishnan/Documents/eclipse/workspace/MMDS/cosinedistance.R”, echo=FALSE, encoding=”UTF-8″)
[1] “A and B is 0.666666666666667”
[1] “B and C is 0.408248290463863”
[1] “A and C is 0”
[1] “A and B is 0.721687836487032”
[1] “B and C is 0.666666666666667”
[1] “A and C is 0.288675134594813”
[1] “A and B is 0.847318545736323”
[1] “B and C is 0.849836585598797”
[1] “A and C is 0”
[1] “A and B is 0.946094540760746”
[1] “B and C is 0.95257934441568”
[1] “A and C is 0.8651809126974”

Filed under Data Mining Algorithms, R

Gram-schmidt

November 4, 2014 Leave a comment

Matrix M has three rows and three columns, and the columns form an orthonormal basis. One of the columns is [2/7,3/7,6/7], and another is [6/7, 2/7, -3/7]. Let the third column be [x,y,z]. Since the length of the vector [x,y,z] must be 1, there is a constraint that x2+y2+z2 = 1. However, there are other constraints, and these other constraints can be used to deduce facts about the ratios among x, y, and z. Compute these ratios, and then identify one of them in the list below.

I viewed the Khan academy course.

But the credit for the Matlab code goes to Vladd. I didn’t follow his explanation but the Khan academy course helped.I used the Matlab online compiler to test Vladd’s code and ported it to R.

I can’t believe the for and if loops in the R code took a full day to debug.


# TODO: Add comment
# 
# Author: radhakrishnan
###############################################################################


        A =  matrix(c(2/7,3/7,6/7,6/7,2/7,-3/7,1,2,3),ncol=3,nrow=3)
		r = dim( A)[[1]];
		c = dim( A)[[2]];
		print(A)
		
		Q = matrix(c(0,0,0,0,0,0,0,0,0),ncol=3,nrow=3)
		for (j in 1:3){

			u = matrix(A[ ,j  ]);
	
			if( j - 1 != 0 ){
				for(i in 1:(j - 1)){
				    e = Q[,i]
					a = as.matrix(A[,j])
					p = (t(e) %*% a) / (t(e) %*% e) * e;
					u = u - p
		        }
			}
           # normalize it to length of 1 and store it
			Q[,j] = u / sqrt(u[1,1]^2 + u[2,1]^2 + u[3,1]^2);
			print(Q)
		}

The result is this. The last column is what I want and that satisfies all constraints.

Orthonormality

$\begin{pmatrix} 0.2857143& 0.8571429& -0.4285714\\ 0.4285714& -0.2857143& 0.8571429\\ 0.8571429& -0.4285714& -0.2857143\\ \end{pmatrix}$

I will check-in the R code into my Git repository.

Filed under Data Mining Algorithms, R

MindSpace

Cosine distance

Gram-schmidt

Orthonormality

Blogroll