Parsing Java Micro-benchmarking Harness data using dplyr – Part 2

I have to add explanations later because I have to determine if the statistical measures calculated are correct or wrong. But this is based on the previous blog post.

Update : I think the measures are correctly plotted.

Types of Error bars used to plot the diagram

Error bars Type Description
Standard error (SEM) Inferential A measure of how variable the mean will be, if you repeat the whole study many times.
Confidence interval (CI), usually 95% CI Inferential A range of values you can be 95% confident contains the true mean.

The parsing will not work if JMH changes the default format of the output file.

library(stringr)
library(dplyr)
library(ggplot2)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
	    select(V1) %>%	
		filter(grepl("^Iteration", V1)) %>%  
        mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

final <- mutate(final,IDX = 1:n())

jc <- final %>%
		filter(IDX < 21)


gc <- final %>%
		filter(IDX > 20)

gc <- mutate(gc,IDX = 1:n())

jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x))))
gc <- data.frame(sapply(gc, function(x) as.numeric(as.character(x))))


print(summary(jc$V1))
error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1))
error1 <- mean(jc$V1)-error
error2 <- mean(jc$V1)+error

q <- qplot(geom = "line",jc$IDX,jc$V1, colour='red')+geom_errorbar(aes(x=jc$IDX, ymin=jc$V1-sd(jc$V1), ymax=jc$V1+sd(jc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=jc$IDX, y=jc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Java Collections")+theme_bw() 

ggsave("D:\\jmh\\jc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1)) 
g <- ggplot(jc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Java collections")

ggsave("D:\\jmh\\ggplotjc.png", width=6, height=6, dpi=100)


print(summary(gc$V1))
error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1))
error1 <- mean(gc$V1)-error
error2 <- mean(gc$V1)+error

q1 <- qplot(geom = "line",gc$IDX,gc$V1, colour='red')+geom_errorbar(aes(x=gc$IDX, ymin=gc$V1-sd(gc$V1), ymax=gc$V1+sd(gc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=gc$IDX, y=gc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Goldmansachs Collections")+theme_bw() 


ggsave("D:\\jmh\\gc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1)) 
g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggsave("D:\\jmh\\ggplotgc.png", width=6, height=6, dpi=100)

jc

gc

Suggested by the R user forum to improve the aesthetics of the plot. The Confidence Interval of 99% shown in the plots above is not correct. But the curves and error bars are correct.

g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggplot creates these two graphs. So instead of qplot code we should use ggplot.

ggplotjc

ggplotgc

Update : See this

Parsing Java Micro-benchmarking Harness data using dplyr – Part 1

This is about the venerable JMH and Hadley Wickham’s dplyr and pipes package. dplyr enables you to have too much fun with data. Its pipes are so powerful and makes short shrift of even messy data.

# VM invoker: D:\Java\bin\java.exe
# VM options: -XX:-TieredCompilation -Dbenchmark.n=10000
# Warmup: 5 iterations, 50 ms each
# Measurement: 20 iterations, 50 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: oracle.stream.javaone.CollectionComparison.goldmansachscollections

# Run progress: 0.00% complete, ETA 00:00:02
# Fork: 1 of 1
# Warmup Iteration 1: 0.443 us/op
# Warmup Iteration 2: 0.290 us/op
# Warmup Iteration 3: 0.343 us/op
# Warmup Iteration 4: 0.350 us/op
# Warmup Iteration 5: 0.388 us/op
Iteration 1: 0.796 us/op
Iteration 2: 0.542 us/op
Iteration 3: 0.510 us/op
Iteration 4: 0.617 us/op
Iteration 5: 0.482 us/op
Iteration 6: 0.387 us/op
Iteration 7: 0.272 us/op
Iteration 8: 0.536 us/op
Iteration 9: 0.498 us/op
Iteration 10: 0.402 us/op
Iteration 11: 0.328 us/op
Iteration 12: 0.542 us/op
Iteration 13: 0.299 us/op
Iteration 14: 0.647 us/op
Iteration 15: 0.291 us/op
Iteration 16: 0.815 us/op
Iteration 17: 0.680 us/op
Iteration 18: 0.363 us/op
Iteration 19: 0.560 us/op
Iteration 20: 0.334 us/op

Result: 0.495 ¦(99.9%) 0.140 us/op [Average]
Statistics: (min, avg, max) = (0.272, 0.495, 0.815), stdev = 0.162
Confidence interval (99.9%): [0.355, 0.636]

# VM invoker: D:\Java\bin\java.exe
# VM options: -XX:-TieredCompilation -Dbenchmark.n=10000
# Warmup: 5 iterations, 50 ms each
# Measurement: 20 iterations, 50 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: oracle.stream.javaone.CollectionComparison.javacollections

# Run progress: 50.00% complete, ETA 00:00:05
# Fork: 1 of 1
# Warmup Iteration 1: 0.475 us/op
# Warmup Iteration 2: 0.696 us/op
# Warmup Iteration 3: 0.816 us/op
# Warmup Iteration 4: 0.622 us/op
# Warmup Iteration 5: 0.574 us/op
Iteration 1: 0.987 us/op
Iteration 2: 0.585 us/op
Iteration 3: 0.770 us/op
Iteration 4: 0.711 us/op
Iteration 5: 0.546 us/op
Iteration 6: 0.553 us/op
Iteration 7: 1.164 us/op
Iteration 8: 1.096 us/op
Iteration 9: 1.477 us/op
Iteration 10: 0.824 us/op
Iteration 11: 1.002 us/op
Iteration 12: 0.504 us/op
Iteration 13: 1.019 us/op
Iteration 14: 0.834 us/op
Iteration 15: 0.589 us/op
Iteration 16: 0.557 us/op
Iteration 17: 1.338 us/op
Iteration 18: 0.906 us/op
Iteration 19: 0.486 us/op
Iteration 20: 0.587 us/op

Result: 0.827 ¦(99.9%) 0.252 us/op [Average]
Statistics: (min, avg, max) = (0.486, 0.827, 1.477), stdev = 0.291
Confidence interval (99.9%): [0.574, 1.079]

# Run complete. Total time: 00:00:10

Benchmark Mode Samples Score Scor
e error Units
o.s.j.CollectionComparison.goldmansachscollections avgt 20 0.495
0.140 us/op
o.s.j.CollectionComparison.javacollections avgt 20 0.827
0.252 us/op

library(stringr)
library(dplyr)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
	    select(V1) %>%	
		filter(grepl("^Iteration", V1)) %>%  
        mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

print(final)

V1
1 0.796
2 0.542
3 0.510
4 0.617
5 0.482
6 0.387
7 0.272
8 0.536
9 0.498
10 0.402
11 0.328
12 0.542
13 0.299
14 0.647
15 0.291
16 0.815
17 0.680
18 0.363
19 0.560
20 0.334
21 0.987
22 0.585
23 0.770
24 0.711
25 0.546
26 0.553
27 1.164
28 1.096
29 1.477
30 0.824
31 1.002
32 0.504
33 1.019
34 0.834
35 0.589
36 0.557
37 1.338
38 0.906
39 0.486
40 0.587

PlantUML

There was a time when I was studiously reading UML 2 articles like this one about the UML 2 Composition Model.

This also reminds me of The Journal of Object Technology.

Its mission statement is this.

The Journal of Object Technology (JOT) is a peer-reviewed, free and open-access journal dedicated to the timely publication of previously unpublished research articles, surveys, tutorials, and technical notes on all aspects of object technology.

But I have never been able to draw the kind of UML 2 diagrams that Conrad Bock describes in his articles in the JOT. Visual Paradigm for UML is the tool that I think is the most flexible and I have used both the Community Edition as well as the licensed version. StarUML is good too but not as versatile as Visual Paradigm.
Later on I started using TikZ and PGF and Graphviz and liked them very much.
UML was passe but I still use it.

Recently I started using PlantUML for a code review project and it was like a whiff of fresh air. It uses Graphviz and there are IDE plugins. Everything was easy to learn and use. I could open IntelliJ IDEA and browse the code and also type the PlantUML Domain-specific language code and draw a complex UML 2 diagram.

It is easy for the developers to design their code.I wish they paid heed to someone.

Composition Model

@startuml

skinparam class {
	BackgroundColor WHite
	ArrowColor SeaGreen
	BorderColor Black
}
class "<u>MyCar : Car"{
  +myMethods()
}

class "<u>YourBoat : Boat"{
  +myMethods()
}

class "<u>E2 : Engine"{
	String name
}

class "<u>P1 : Propeller"{
	String name
}

class "<u>E1 : Engine"{
	String name
}

class "<u>W1 : Wheel"{
	String name
}

class "<u>W2 : Wheel"{
	String name
}

class "<u>W3 : Wheel"{
	String name
}

class "<u>W4 : Wheel"{
	String name
}
"<u>E1 : Engine" -- "front"  "<u>W1 : Wheel" : "powers"
"<u>E1 : Engine" -- "front" "<u>W2 : Wheel" : "powers"
"<u>E2 : Engine" --  "<u>P1 : Propeller" : "powers"
"<u>YourBoat : Boat" "inBoat" *-- "p" "<u>P1 : Propeller"
"<u>YourBoat : Boat" "inBoat" *-- "e" "<u>E2 : Engine"
"<u>MyCar : Car" "inCarAsBack" *--  "<u>W1 : Wheel"
"<u>MyCar : Car" "inCarAsBack" *--  "<u>W2 : Wheel"
"<u>MyCar : Car" "inCarAsBack" -right--- "back" "<u>W3 : Wheel"
"<u>MyCar : Car" "inCarAsBack" -right--- "back" "<u>W4 : Wheel"
"<u>MyCar : Car" "inCar" *-- "e" "<u>E1 : Engine"

hide members
hide  circle

@enduml

Update 1:

The flow final activity is in the UML 2 spec. If it is not desired to abort all flows in the activity, use flow final instead. It is a circle with a cross inside. But I didn’t see that in the PlantUML doc.

How can I create it ?

Flow final

The PlantUML team responded.

Well, it’s not possible today.

However, we propose that we have now 2 keywords :

stop (that will display the actual circle)
end (that will display a circle with a cross inside)

Update 2:

What’s New ?

7 June, 2015: Add end keyword in Activity Diagram Beta. (Thanks to Radhakrishnan Mohan for the suggestion).

Is this ok for you, or do you prefer to stay anonymous ?

PS : You can test the beta version here https://dl.dropboxusercontent.com/u/13064071/plantuml.jar