MindSpace

Prevent Vagrant from resuming a download

December 6, 2014 10 Comments

Have you come across this nagging error when you attempt to resume a failed download ?

==> default: Box ‘udacity/ud381’ could not be found. Attempting to find and install…
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Loading metadata for box ‘udacity/ud381’
default: URL: https://vagrantcloud.com/udacity/ud381
==> default: Adding box ‘udacity/ud381’ (v0.0.5) for provider: virtualbox
default: Downloading: https://vagrantcloud.com/udacity/boxes/ud381/versions/0.0.5/providers/virtualbox.box
==> default: Box download is resuming from prior download progress
An error occurred while downloading the remote file. The error
message, if any, is reproduced below. Please fix this error and try
again.

HTTP server doesn’t seem to support byte ranges. Cannot resume.

One has to delete the partially downloaded box file.

Mohans-MacBook-Pro:ud381 radhakrishnan$ rm ~/.vagrant.d/tmp/*

Filed under Virtualization

Deployment on Heroku

December 5, 2014 Leave a comment

I recently pushed my AngularJS/Spring Boot/Rest application to Heroku.

buildscript {
    repositories {
        maven { url "http://repo.spring.io/libs-release" }
        mavenLocal()
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.1.8.RELEASE")
    }
}

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'idea'
apply plugin: 'spring-boot'
mainClassName = "rest.controller.Application"

jar {
    baseName = 'Angular-Boot-Rest'
    version =  '0.1.0'
}

repositories {
    mavenLocal()
    mavenCentral()
    maven { url "http://repo.spring.io/libs-release" }
}



tasks.withType(Copy) {
        eachFile { println it.file }
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    testCompile("junit:junit")
}

task wrapper(type: Wrapper) {
    gradleVersion = '1.11'
}
task stage(dependsOn: ["build"]){}

I added a new task stage and mainClassName.

It allots a free port on which Tomcat binds. If one specified one’s own port then the application does not bind to it within 60 seconds which is the time limit allowed.

Heroku needs this file too.

Procfile

web: java $JAVA_OPTS -jar target/Angular-Boot-Rest.jar

This is the screenshot. Note the URL which is allotted too.

Filed under Cloud, Java

I came across this calculation when I was reading about Recommender systems. The last column is the rating given by a particular user for a movie. The other columns of this matrix denote whether a particular actor appeared in the movie or not.

$\begin{pmatrix} 1& 0& 1& 0& 1& 2\\ 1& 1& 0 & 0& 1& 6\\ 0& 1& 0 & 1 & 0 & 2\\ \end{pmatrix}$

The first five attributes are Boolean, and the last is an integer "rating." Assume that the scale factor for the rating is α. Compute, as a function of α, the cosine distances between each pair of profiles. For each of α = 0, 0.5, 1, and 2, determine the cosine of the angle between each pair of vectors.

My R code to calculate is this.

# TODO: Add comment
# 
# Author: radhakrishnan
###############################################################################


A = matrix(c(1,0,1,0,1,2,
			 1,1,0,0,1,6,
			 0,1,0,1,0,2),nrow=3,ncol=6,byrow=TRUE)


rownames(A) <- c("A","B","C")

scale1 <- A
scale1[,6] <- A[,6] * 0
print( paste( "A and B is ", ( sum(scale1[1,] * scale1[2,]) )/( sqrt( sum(scale1[1,]^2) ) * sqrt( sum(scale1[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale1[2,] * scale1[3,]) )/( sqrt( sum(scale1[2,]^2) ) * sqrt( sum(scale1[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale1[1,] * scale1[3,]) )/( sqrt( sum(scale1[1,]^2) ) * sqrt( sum(scale1[3,]^2) ) )) )



scale2 <- A
scale2[,6] <- A[,6] * 0.5
print( paste( "A and B is ", ( sum(scale2[1,] * scale2[2,]) )/( sqrt( sum(scale2[1,]^2) ) * sqrt( sum(scale2[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale2[2,] * scale2[3,]) )/( sqrt( sum(scale2[2,]^2) ) * sqrt( sum(scale2[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale2[1,] * scale2[3,]) )/( sqrt( sum(scale2[1,]^2) ) * sqrt( sum(scale2[3,]^2) ) )) )

scale3 <- A
scale3[,6] <- A[,6] * 1
print( paste( "A and B is ", ( sum(scale3[1,] * scale3[2,]) )/( sqrt( sum(scale3[1,]^2) ) * sqrt( sum(scale3[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale3[2,] * scale3[3,]) )/( sqrt( sum(scale3[2,]^2) ) * sqrt( sum(scale3[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale3[1,] * scale1[3,]) )/( sqrt( sum(scale3[1,]^2) ) * sqrt( sum(scale3[3,]^2) ) )) )

scale4 <- A
scale4[,6] <- A[,6] * 2
print( paste( "A and B is ", ( sum(scale4[1,] * scale4[2,]) )/( sqrt( sum(scale4[1,]^2) ) * sqrt( sum(scale4[2,]^2) ) )) )
print( paste( "B and C is ", ( sum(scale4[2,] * scale4[3,]) )/( sqrt( sum(scale4[2,]^2) ) * sqrt( sum(scale4[3,]^2) ) )) )
print( paste( "A and C is ", ( sum(scale4[1,] * scale4[3,]) )/( sqrt( sum(scale4[1,]^2) ) * sqrt( sum(scale4[3,]^2) ) )) )

> source(“/Users/radhakrishnan/Documents/eclipse/workspace/MMDS/cosinedistance.R”, echo=FALSE, encoding=”UTF-8″)
[1] “A and B is 0.666666666666667”
[1] “B and C is 0.408248290463863”
[1] “A and C is 0”
[1] “A and B is 0.721687836487032”
[1] “B and C is 0.666666666666667”
[1] “A and C is 0.288675134594813”
[1] “A and B is 0.847318545736323”
[1] “B and C is 0.849836585598797”
[1] “A and C is 0”
[1] “A and B is 0.946094540760746”
[1] “B and C is 0.95257934441568”
[1] “A and C is 0.8651809126974”

Filed under Data Mining Algorithms, R

Gram-schmidt

November 4, 2014 Leave a comment

Matrix M has three rows and three columns, and the columns form an orthonormal basis. One of the columns is [2/7,3/7,6/7], and another is [6/7, 2/7, -3/7]. Let the third column be [x,y,z]. Since the length of the vector [x,y,z] must be 1, there is a constraint that x2+y2+z2 = 1. However, there are other constraints, and these other constraints can be used to deduce facts about the ratios among x, y, and z. Compute these ratios, and then identify one of them in the list below.

I viewed the Khan academy course.

But the credit for the Matlab code goes to Vladd. I didn’t follow his explanation but the Khan academy course helped.I used the Matlab online compiler to test Vladd’s code and ported it to R.

I can’t believe the for and if loops in the R code took a full day to debug.


# TODO: Add comment
# 
# Author: radhakrishnan
###############################################################################


        A =  matrix(c(2/7,3/7,6/7,6/7,2/7,-3/7,1,2,3),ncol=3,nrow=3)
		r = dim( A)[[1]];
		c = dim( A)[[2]];
		print(A)
		
		Q = matrix(c(0,0,0,0,0,0,0,0,0),ncol=3,nrow=3)
		for (j in 1:3){

			u = matrix(A[ ,j  ]);
	
			if( j - 1 != 0 ){
				for(i in 1:(j - 1)){
				    e = Q[,i]
					a = as.matrix(A[,j])
					p = (t(e) %*% a) / (t(e) %*% e) * e;
					u = u - p
		        }
			}
           # normalize it to length of 1 and store it
			Q[,j] = u / sqrt(u[1,1]^2 + u[2,1]^2 + u[3,1]^2);
			print(Q)
		}

The result is this. The last column is what I want and that satisfies all constraints.

Orthonormality

$\begin{pmatrix} 0.2857143& 0.8571429& -0.4285714\\ 0.4285714& -0.2857143& 0.8571429\\ 0.8571429& -0.4285714& -0.2857143\\ \end{pmatrix}$

I will check-in the R code into my Git repository.

Filed under Data Mining Algorithms, R

Spectral Clustering

October 27, 2014 Leave a comment

   2 ----6
 /  \    |
1    4   |
 \  /  \ |
  3      5

The goal is to find two clusters in this graph using Spectral Clustering on the Laplacian matrix. Compute the Laplacian of this graph. Then compute the second eigen vector of the Laplacian (the one corresponding to the second smallest eigenvalue).

A = matrix(c(0,1,1,0,0,0,
			 1,0,0,1,0,1,
			 1,0,0,1,0,0,
			 0,1,1,0,1,0,
			 0,0,0,1,0,1,
			 0,1,0,0,1,0),nrow=6,ncol=6,byrow=TRUE)
colnames(A) <- c("1","2","3","4","5","6")
rownames(A) <- c("1","2","3","4","5","6")

B = matrix(c(2,0,0,0,0,0,
			 0,3,0,0,0,0,
			 0,0,2,0,0,0,
			 0,0,0,3,0,0,
			 0,0,0,0,2,0,
			 0,0,0,0,0,2),nrow=6,ncol=6,byrow=TRUE)
L = B - A
print(L)
e <- eigen(L)

R can be used to get the eigen values and vectors. But Wolfram gives these values.

Eigenvalues

$\lambda1 = 5\\ \lambda2 = 3\\ \lambda3 = 3\\ \lambda4 = 2\\ \lambda5 = 1\\ \lambda6 = 0\\$

Eigenvectors

$\begin{pmatrix} 1& -2& -1& 2& -1& 1\\ 0& -1& 1& -1& 0& 1\\ 1& -1& 0& -1& 1& 0\\ 1& 1& -1& -1& -1& 1\\ -1& 0& -1& 0& 1& 1\\ 1& 1& 1& 1& 1& 1\\ \end{pmatrix}$

The second highest eigen value is $\lambda5 = 1\\$

So the 5th row of the eigen vector matrix is

$\begin{pmatrix} -1& 0& -1& 0& 1& 1\\ \end{pmatrix}$

This means that the
1st and 3rd nodes are part of one cluster and 5th and 6th nodes are part of the other cluster. 2nd and 3rd nodes can be part of either cluster.

Filed under Data Mining Algorithms

DGIM Algorithm

October 25, 2014 6 Comments

I think I understood the basic Datar-Gionis-Indyk-Motwani Algorithm which is explained in the book “Mining of massive datasets” by Jure Leskovec(Stanford Univ.),Anand Rajaraman(Milliway Labs) and Jeffrey D. Ullman(Stanford Univ.)

I will add more details later but the diagram below explains it. I used Tikz to draw this picture. I will check-in the tikz code to my github and post the link.

Update: https://github.com/mohanr/tikz/blob/master/dgim

Suppose we are using the DGIM algorithm of Section 4.6.2 to estimate the number of 1's in suffixes of a sliding window of length 40. The current timestamp is 100. Note: we are showing timestamps as absolute values, rather than modulo the window size, as DGIM would do. Suppose that at times 101 through 105, 1's appear in the stream. Compute the set of buckets that would exist in the system at time 105.

Filed under Data Mining Algorithms

Market-basket problem

October 18, 2014 Leave a comment

This is the general market-basket problem. It is an algorithm to find how many items are frequently found across many shoppers’ baskets based on a threshold. The threshold is a minimum number of occurrences of a particular item. Items that are bought a certain number of times(threshold) are considered frequent.

These items can be singletons or pairs of items(doubletons) and tripletons and so on.

Imagine there are 100 baskets, numbered 1,2,...,100, and 100 items, similarly numbered. Item i is in basket j if and only if i divides j evenly. For example, basket 24 is the set of items {1,2,3,4,6,8,12,24}. Describe all the association rules that have 100% confidence. Which of the following rules has 100% confidence?

A brute-force R approach to solve such a problem. This is a small number of items. In fact such data mining algorithms deal with large quantities of data and a fixed amount of memory. One such algorithm is the A-priori algorithm.

Each of the if loop checks for a condition like this.

 {8,10} -> 20

This checks if item 20 is always found in a basket that has items 8 and 10 or not.

library(Hmisc)
for( i in 1:100){
  a <- 1
  for( j in 1:100){

	if( i %% j == 0 ){
		a <- append(a,j)
        }
  }
  #print(paste( i, a ))
  if( 8 %in% a &&  10 %in% a && 20 %nin% a ){ //{8,10} -> 20
	#print (a)
  }
  if( 3 %in% a &&  1 %in% a && 6 %in% a && 12 %nin% a ){
	print (a)
  }
  if( 8 %in% a &&  12 %in% a &&  96 %nin% a ){
	#print (a)
  }
  if( 3 %in% a &&  5 %in% a &&  1 %nin% a ){
	#print (a)
  }
}}

Filed under Data Mining Algorithms, R

The Caltech-JPL Summer School on Big Data Analytics

October 13, 2014 Leave a comment

This treasure trove of videos teach many Machine Learning subjects. This is not intended to be a typical Coursera course because there are no deadlines or tests.

There is so much to write about what I learn from these videos but for now these measures to assess the costs and benefits of a classification model are intended as reference.

Filed under Machine Learning

Frequent Itemsets

October 6, 2014 Leave a comment

I am reading Chapter 6 on Frequent Itemsets. I hope to understand the A-priori algorithm.

Filed under Machine Learning

PageRank

October 4, 2014 2 Comments

Suppose we compute PageRank with a β of 0.7, and we introduce the additional constraint that the sum of the PageRanks of the three pages must be 3, to handle the problem that otherwise any multiple of a solution will also be a solution. Compute the PageRanks a, b, and c of the three pages A, B, and C, respectively.

$A = \sum_{i\rightarrow{j}} \beta r_i/d_i + (1 - \beta ) e / n$

My R code is this.

M = matrix(c(0,1/2,1/2,0,0,1,0,0,1),ncol=3)
e = matrix(c(1,1,1),ncol=1)
v1 = matrix(c(1,1,1),ncol=1) 
v1 = v1 / 3
for( i in 1:5){

  v1 =  ((0.7 * M ) %*% v1 ) + (((1 - 0.7 ) * e ) /3 )
}
   v1 = v1 * 3

      [,1]
[1,] 0.300
[2,] 0.405
[3,] 2.295

Filed under R

← Older posts

Newer posts →

MindSpace

Prevent Vagrant from resuming a download

Deployment on Heroku

Procfile

Cosine distance

Gram-schmidt

Orthonormality

Spectral Clustering

Eigenvalues

Eigenvectors

DGIM Algorithm

Market-basket problem

The Caltech-JPL Summer School on Big Data Analytics

Frequent Itemsets

PageRank

Blogroll