NIO.2
February 10, 2014 Leave a comment
This post is not exactly about NIO.2 even though I use SeekableByteChannel. It is about a pestering question asked by two interviewers in the past. I believe a coding task is a good and reliable first step in any interview process. But some people ask questions in a patronising way.
This is one such question. How do you identify duplicate rows in a file ?
So I decided to do what any programmer worth her salt will do. This is not the most efficient way but I wanted to try NIO.2’s SeekableByteChannel because none of the firms I have worked for in the past has any need for any new Java API. They still wallow in legacy Java applications. I don’t get a chance to use it for any project.
Sample file to parse
I wanted to filter duplicates in column 3 – the name of the page requested – from a sample log.
timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,Latency 1346999466187,32,Home page - anon,200,OK,Anonymous Browsing 1-2,text,true,31 1346999466182,37,Login form,200,OK,Node save 3-1,text,true,36 1346999466184,35,Home page - anon,200,OK,Anonymous Browsing 1-11,text,true,32 1346999466182,37,Home page - anon,200,OK,Anonymous Browsing 1-1,text,true,34 1346999466189,30,Home page - anon,200,OK,Anonymous Browsing 1-4,text,true,27 1346999466185,46,Home page - anon,200,OK,Anonymous Browsing 1-5,text,true,34 1346999466185,44,Search,200,OK,Search 4-1,text,true,35 1346999466188,28,Home page - anon,200,OK,Anonymous Browsing 1-3,text,true,26 1346999466182,33,Home page - anon,200,OK,Anonymous Browsing 1-7,text,true,32 1346999466182,36,Login Form,200,OK,Perform Login/View Account 5-1,text,true,35 1346999466182,35,Home page - anon,200,OK,Anonymous Browsing 1-10,text,true,33
Sample Java code that is not efficient
package com.test; import java.io.File; import java.io.IOException; import java.nio.ByteBuffer; import java.nio.channels.SeekableByteChannel; import java.nio.charset.Charset; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.StandardOpenOption; import java.util.EnumSet; import java.util.Set; import java.util.TreeSet; public class DuplicateTest { Set<String> values = new TreeSet<>(); public static void main( String... argv ){ DuplicateTest dt = new DuplicateTest(); dt.analyze(); } private void analyze() { Path p = Paths.get(File.separator + "Users" + File.separator + "radhakrishnan" + File.separator + "Lambdas", "duplicate.txt"); ByteBuffer b = ByteBuffer.allocate(1); String encoding = System.getProperty( "file.encoding"); b.clear(); char c; StringBuilder sb = new StringBuilder(); try( SeekableByteChannel skbc = Files.newByteChannel(p, EnumSet.of(StandardOpenOption.READ))) { //Move the position after the first line(heading) skbc.position(89); while( skbc.read( b ) > 0){ b.flip(); c = Charset.forName(encoding).decode(b).get(); //System.out.println(c); sb.append(c); if( c == '\n' || c == '\r'){ //Move the position to read the line after the new line skbc.position(skbc.position()); extractAndStore(sb.toString()); sb = new StringBuilder(); } b.clear(); } } catch (IOException e) { e.printStackTrace(); } System.out.println(values); } private void extractAndStore(String s) { values.add(s.split("[,]")[2]); } }
Result
[Home page - anon, Home page - auth, Login, Login Form, Login form, Logout, Node edit form, Random node - anon, Search, User profile page, node edit post]