NIO.2

This post is not exactly about NIO.2 even though I use SeekableByteChannel. It is about a pestering question asked by two interviewers in the past. I believe a coding task is a good and reliable first step in any interview process. But some people ask questions in a patronising way.

This is one such question. How do you identify duplicate rows in a file ?

So I decided to do what any programmer worth her salt will do. This is not the most efficient way but I wanted to try NIO.2’s SeekableByteChannel because none of the firms I have worked for in the past has any need for any new Java API. They still wallow in legacy Java applications. I don’t get a chance to use it for any project.

Sample file to parse

I wanted to filter duplicates in column 3 – the name of the page requested – from a sample log.


timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,Latency
1346999466187,32,Home page - anon,200,OK,Anonymous Browsing 1-2,text,true,31
1346999466182,37,Login form,200,OK,Node save 3-1,text,true,36
1346999466184,35,Home page - anon,200,OK,Anonymous Browsing 1-11,text,true,32
1346999466182,37,Home page - anon,200,OK,Anonymous Browsing 1-1,text,true,34
1346999466189,30,Home page - anon,200,OK,Anonymous Browsing 1-4,text,true,27
1346999466185,46,Home page - anon,200,OK,Anonymous Browsing 1-5,text,true,34
1346999466185,44,Search,200,OK,Search 4-1,text,true,35
1346999466188,28,Home page - anon,200,OK,Anonymous Browsing 1-3,text,true,26
1346999466182,33,Home page - anon,200,OK,Anonymous Browsing 1-7,text,true,32
1346999466182,36,Login Form,200,OK,Perform Login/View Account 5-1,text,true,35
1346999466182,35,Home page - anon,200,OK,Anonymous Browsing 1-10,text,true,33

Sample Java code that is not efficient


package com.test;

import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.EnumSet;
import java.util.Set;
import java.util.TreeSet;

public class DuplicateTest {

    Set<String> values = new TreeSet<>();

    public static void main( String... argv ){

        DuplicateTest dt = new DuplicateTest();

        dt.analyze();
    }

    private  void analyze() {
        
        Path p = Paths.get(File.separator + "Users"
                                    + File.separator + "radhakrishnan" +
                                            File.separator + "Lambdas", "duplicate.txt");
        ByteBuffer b = ByteBuffer.allocate(1);
        
        String encoding = System.getProperty( "file.encoding");
        
        b.clear();
        
        char c;
        
        StringBuilder sb = new StringBuilder();
        
        try(
            SeekableByteChannel skbc = Files.newByteChannel(p,
                                                            EnumSet.of(StandardOpenOption.READ))) {
            //Move the position after the first line(heading)
            skbc.position(89);

            while( skbc.read( b ) > 0){
                
                   b.flip();
                
                   c = Charset.forName(encoding).decode(b).get();
                
                   //System.out.println(c);
                   sb.append(c);
                
                   if( c == '\n' || c == '\r'){
                       //Move the position to read the line after the new line
                       skbc.position(skbc.position());
                       extractAndStore(sb.toString());
                       sb = new StringBuilder();
                   }
                   b.clear();
            }

        } catch (IOException e) {
            
            e.printStackTrace();
            
        }
        
        System.out.println(values);
    }

    private  void extractAndStore(String s) {

        values.add(s.split("[,]")[2]);
    }

}

Result

[Home page - anon, Home page - auth, Login, Login Form, Login form, Logout, Node edit form, Random node - anon, Search, User profile page, node edit post]

Leave a comment