A randomized divide-and-conquer algorithm

There is nothing new here. My Java code to find the k^{th} smallest element in an array(Algorithms book by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani). The code is written based on the pseudo-code.

I am using Google guava here.

Ints.toArray( less )

This is the algorithm definition.

selection(S,k) = \begin{cases} selection(S_L,k), & \mbox{if } k\le |S_L|\\ v, & \mbox{if } |S_L| < k \le |S_L| + |S_v|\\ selection(S_R,k-|S_L|-|S_v|, & \mbox{if } k\ge |S_L|+|S_v| \end{cases}

import com.google.common.primitives.Ints;

import java.util.ArrayList;
import java.util.Random;

import static java.util.Arrays.sort;

public class KThLargestSelection {

    private Random r = new Random();

    static int[] a1 = new int[]{ 0,2,7,3,4,9};

        public int findKThLargest(  int[] a, int k ){

                int pivot = getPivot(a);

                ArrayList<Integer> less = new ArrayList<>();
                ArrayList<Integer> equal = new ArrayList<>();
                ArrayList<Integer> great = new ArrayList<>();

                for( int value : a ){
                        if ( value < pivot){
                            less.add( value );
                        } else if ( value == pivot){
                            equal.add( value );
                        }else if ( value > pivot){
                            great.add( value );
                        }
                }
                        if (k <= less.size()) {
                            return findKThLargest (Ints.toArray( less ), k);
                        }else if (k <= less.size() + equal.size()){
                            return  pivot;
                        }else
                            return   findKThLargest (Ints.toArray( great  ), k - less.size() - equal.size());

        }

    private int getPivot( int[] data ){
        int[] highLow = new int[0];

            System.out.println("getPivot [" + data.length + "] " + toStringArray(data));
            highLow = new int[] { data[0],data[data.length - 1] };

        if( data.length >= 3){
            sort(new int[]{data[r.nextInt(2)],
                           data[r.nextInt(2)],
                           data[r.nextInt(2)]});
        }
        return highLow[ 1 ];
    }

    private String toStringArray(int[] data){
        StringBuilder sb = new StringBuilder();
        for( int i = 0 ; i < data.length ; i ++ ){
            sb.append( data[i] + " ");
        }
        return sb.toString();
    }
    
    public static void main( String... s){
        KThLargestSelection l = new KThLargestSelection();
        System.out.println( l.findKThLargest(a1, 1));
    }
}

Hackerrank

I know it is embarassing to post code like this without asymptotic analysis. After taking a strenuous 10 week Machine learning course I am again coding java. This is the Utopia problem which can be solved in one line.

package com.hackerrank;

import java.io.*;
import java.util.ArrayList;


public class Solution {

    private int noOfTestCases;

    private ArrayList<Integer> tests = new ArrayList<>();

    OutputStream outputStream = System.out;
    PrintWriter pw = new PrintWriter(new BufferedWriter(new OutputStreamWriter(outputStream)));

    protected void parseStdin(String[] data){

        int test = Integer.parseInt( data[0] );
        assert( test >= 1 && test <= 10);
        noOfTestCases = test;
            for( int i = 1 ; i < data.length ; i ++ ){
                int d = Integer.parseInt( data[i] );
                assert( d >= 0 && d <= 60);
                tests.add( d );
            }
        assert( tests.size() == noOfTestCases );
    }


    public ArrayList<String>  getInput(BufferedReader br) throws IOException {
        String line = null;
        ArrayList<String> arguments = new ArrayList<>();
        while( (line = br.readLine()) != null ){
            arguments.add(line);
        }
        br.close();
        return arguments;
    }

    public static void main( String... argv ) throws IOException {

        Solution utopia = new Solution();
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
        ArrayList<String> arguments = utopia.getInput( br );
        utopia.parseStdin( arguments.toArray(new String[arguments.size()]));
        utopia.test();


    }

    protected void test() {
        assert( 0 != noOfTestCases );
        assert( tests.size() == noOfTestCases );
        for( Integer test : tests)   {
            if( test != 0 ){
               if(test % 2  == 0){
                   growEvenly(test);
               }else{
                   growOddly(test);

               }
            }else{
                pw.println(1);
            }
        }
        pw.close();
    }

    private void growOddly(int test) {
        int odd = 1;
        if(test == 1 ){
            pw.println(test + 1);
        }else{
            for(int j = 0,k = test - 1 ; j < k; j = j + 2 ){
                odd = odd + odd + 1;
            }
            pw.println(odd * 2); pw.flush();
        }

    }

    private void growEvenly(int test) {
        int even = 1;
        for(int j = 0,k = test / 2 ; j < k ; j ++){
            even = even * 2 + 1;

        }
        pw.println(even);  pw.flush();
    }
}

Easymock

I have to expect 4 calls to readLine to accomodate null.

package com.hackerrank;

import org.junit.Assert;
import org.junit.*;

import java.io.BufferedReader;
import java.io.IOException;

import static org.easymock.EasyMock.createMock;
import static org.easymock.EasyMock.expect;
import static org.easymock.EasyMock.replay;

public class UtopiaTest {

    String[] argv;

    Solution solution;

    @Before
    public void sendInput() throws IOException {
        BufferedReader mockInputStream = createMock(BufferedReader.class);
        expect(mockInputStream.readLine()).andReturn(String.valueOf(2));
        expect(mockInputStream.readLine()).andReturn(String.valueOf(0));
        expect(mockInputStream.readLine()).andReturn(String.valueOf(1));
        expect(mockInputStream.readLine()).andReturn(null);
        replay(mockInputStream);
        solution = new Solution();
        argv = solution.getInput(mockInputStream).toArray(new String[3]);
        Assert.assertTrue(argv.length == 3);
    }

    @Test
    public void parse(){
        solution = new Solution();
        solution.parseStdin( argv );
    }

    @Test
    public void test(){
        solution = new Solution();
        solution.parseStdin( argv );
        solution.test();
    }
}

Octave code example

This is one of the simpler code tasks I worked on. I use ‘R’ but since this is Octave my first attempt wasn’t generic. The for loop assumes that there are 2 columns in the matrix shown below. There is a way to modify it to be more generic but this an example to show my Octave skills !!

I have a matrix with 2 columns and a vector which has repeating sets of the numbers 3,2 and 1. The task is to calculate the mean of all values in each column of the matrix rows that are grouped by the numbers 3, 2 and 1. So I decided to add the vector as the 3rd column of the matrix.

Matrix

\begin{pmatrix}      0.5403023 & -0.9576595 \\    -0.4161468 & -0.2751633 \\    -0.9899925 &  0.6603167 \\    -0.6536436 &  0.9887046 \\     0.2836622 &  0.4080821 \\     0.9601703 & -0.5477293 \\     0.7539023 & -0.9999608 \\    -0.1455000 & -0.5328330 \\    -0.9111303 &  0.4241790 \\    -0.8390715 &  0.9912028 \\     0.0044257 &  0.6469193 \\     0.8438540 & -0.2921388 \\     0.9074468 & -0.9626059 \\     0.1367372 & -0.7480575 \\    -0.7596879 &  0.1542514  \end{pmatrix}

Vector

\begin{pmatrix}      2 \\     3 \\     1 \\     2 \\     3 \\     1 \\     2 \\     3 \\     1 \\     2 \\     3 \\     1 \\     2 \\     3 \\     1  \end{pmatrix}

Add the vector as the 3rd column.

centroids = zeros(K, n);%Declare matrix to store the result
X(:,end + 1 ) = idx;

\begin{pmatrix}        0.5403023 & -0.9576595 &  2.0000000 \\    -0.4161468 & -0.2751633 &  3.0000000 \\    -0.9899925 &  0.6603167 &  1.0000000 \\    -0.6536436 &  0.9887046 &  2.0000000 \\     0.2836622 &  0.4080821 &  3.0000000 \\     0.9601703 & -0.5477293 &  1.0000000 \\     0.7539023 & -0.9999608 &  2.0000000 \\    -0.1455000 & -0.5328330 &  3.0000000 \\    -0.9111303 &  0.4241790 &  1.0000000 \\    -0.8390715 &  0.9912028 &  2.0000000 \\     0.0044257 &  0.6469193 &  3.0000000 \\     0.8438540 & -0.2921388 &  1.0000000 \\     0.9074468 & -0.9626059 &  2.0000000 \\     0.1367372 & -0.7480575 &  3.0000000 \\    -0.7596879 &  0.1542514 &  1.0000000 \\  \end{pmatrix}

Find the unique values of the 3rd column. The decimals do not matter here.

uidx = unique(X(:,3),"columns");

Vector with unique elements

\begin{pmatrix}   1 \\  2 \\  3  \end{pmatrix}

I use find to get all the row numbers(j) of the values in the matrix where the 3rd column matches each of the unique values(uidx)

for i = 1:size(uidx,1)
 [j,k]=find(X(:,3) == uidx(i) );
 s = X(j,1);
 t = X(j,2);
 centroids(i,:) = [mean(s);mean(t)]
end

Mean of each column for all the grouped rows

\begin{pmatrix}    -0.171357 &  0.079776 \\     0.141787 & -0.188064 \\    -0.027364 & -0.100211    \end{pmatrix}

Hadoop

Screen Shot 2014-04-27 at 12.04.31 AMRecruiters are spellbound by this incantation – Hadoop. So I took the plunge and started working with it. I haven’t made much progress but now I use this Sandbox VM and follow the tutorial steps.

Screen Shot 2014-04-27 at 1.24.09 AM

Machine learning on Big data

I just viewed the webinar conducted by SkyTree. This particular slide has some information about the evolution of Machine Learning.

Screen Shot 2014-04-15 at 9.14.24 PM

Gradient checking quiz

Let J(\theta) =  \theta^3. Furthermore, let \theta = 1 and \epsilon=0.01. You use the formula

(J(\theta+\epsilon )-   J(\theta-\epsilon))/2\epsilon

to approximate the derivative. What value do you get using this approximation ?(When \theta=1,the true, exact derivative is
d/d\theta J(\theta) = 3 ).

The Octave code that I used to solve this is

((1+0.01)^3-(1-0.01)^3)/(2*0.10)

3.0001

Logistic Regression

I just finished the code for a Logistic Regression classification model. I worked on a Card transaction processing system and there was a requirement for identifying whether a card transaction is fradulent or not. We did not use any classification model but if we had had a training set of historical data we could have used this learning algorithm.
Most of the time it is the lack of skills that affects software projects that I have been involved with.

Cost function for Logistic Regression

J(\theta) = 1/m \sum_{i=1}^{m}[-y^{(i)} log( h_\theta(x^{(i)}) - (1-y^{(i)})log( 1 - h_\theta(x^{(i)}))] ;

Gradient of the cost

(\partial J(\theta)/\partial \theta_j ) = 1/m \sum_{i=1}^{m} ( h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_j

This is the plot generated by the course material based on my code. The line is a decision boundary that shows which exam scores ensure admittance. We can use historical card transaction data in place of this.

Screen Shot 2014-04-06 at 9.12.09 PM

This classificatio model can predict which card transactions could be fradulent based on a probability calculated by the algorithm. So here instead of plotting exam scores on the axes we can plot card transaction data.

Varargs

Is this confusing code ? I was stumped for a few seconds.

public class PassThreeValues {
     static void calculateSomeThing(int one, int... values){}

     public static void main( String... argv ){                 
         calculateSomeThing(1,2,3);
    }

Machine Learning

Machine LearningThe most difficult part for me was the code required to compute the partial derivatives.

Eventually using Octave I was able to vectorize the code and compress into a line.

Several years of Java programming experience didn’t prepare me for this type of code dealing with matrix transpositions and calculations.

Cost function I was asked to minimize is

J(\theta_0,\theta_1) = (\partial/\partial \theta_0 )1/m \sum_{i=1}^{m} ( h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}

Plots showing the result of computing the gradient descent

3D Plot

3D

Contour Plot

Contour

I also learnt single and multi-variate Linear Regression which will help me with my Capacity Planning calculations.

Selenium WebDriver and Play

big-logo
I have advocated tests using Selenium WebDriver in the past many times. Will it become a de facto standard now that frameworks like Play make it very easy ?

It is not even evident below that Selenium WebDriver is being used. It is HTMLUNIT that is imported statically.

A test server is started, a test application is setup and a browser is simulated, all in a few lines of code. Very economical.


import static org.fest.assertions.Assertions.assertThat;
import static play.test.Helpers.*;

import org.junit.Test;

import controllers.routes;
import play.Logger;
import play.libs.F.Callback;
import play.test.TestBrowser;
import static helpers.TestSetup.*;

public class ProposalSubmissionTest {

	@Test
	public void proposalTest(){
		running(testServer(3333,fakeApplication(testDbSettings())),HTMLUNIT, new Callback<TestBrowser>(){

			@Override
			public void invoke(TestBrowser browser) throws Throwable {
				browser.goTo("http://localhost:3333" + routes.MainController.newProposal().url());
				assertThat(browser.title()).isEqualTo("New Proposal");
				Logger.debug(browser.title());
			}
			
		});
	}
}