Parsing Java Micro-benchmarking Harness data using dplyr – Part 2

June 18, 2015 Leave a comment

I have to add explanations later because I have to determine if the statistical measures calculated are correct or wrong. But this is based on the previous blog post.

Update : I think the measures are correctly plotted.

Types of Error bars used to plot the diagram

Error bars	Type	Description
Standard error (SEM)	Inferential	A measure of how variable the mean will be, if you repeat the whole study many times.
Confidence interval (CI), usually 95% CI	Inferential	A range of values you can be 95% confident contains the true mean.

The parsing will not work if JMH changes the default format of the output file.

library(stringr)
library(dplyr)
library(ggplot2)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
	    select(V1) %>%	
		filter(grepl("^Iteration", V1)) %>%  
        mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

final <- mutate(final,IDX = 1:n())

jc <- final %>%
		filter(IDX < 21)


gc <- final %>%
		filter(IDX > 20)

gc <- mutate(gc,IDX = 1:n())

jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x))))
gc <- data.frame(sapply(gc, function(x) as.numeric(as.character(x))))


print(summary(jc$V1))
error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1))
error1 <- mean(jc$V1)-error
error2 <- mean(jc$V1)+error

q <- qplot(geom = "line",jc$IDX,jc$V1, colour='red')+geom_errorbar(aes(x=jc$IDX, ymin=jc$V1-sd(jc$V1), ymax=jc$V1+sd(jc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=jc$IDX, y=jc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Java Collections")+theme_bw() 

ggsave("D:\\jmh\\jc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1)) 
g <- ggplot(jc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Java collections")

ggsave("D:\\jmh\\ggplotjc.png", width=6, height=6, dpi=100)


print(summary(gc$V1))
error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1))
error1 <- mean(gc$V1)-error
error2 <- mean(gc$V1)+error

q1 <- qplot(geom = "line",gc$IDX,gc$V1, colour='red')+geom_errorbar(aes(x=gc$IDX, ymin=gc$V1-sd(gc$V1), ymax=gc$V1+sd(gc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=gc$IDX, y=gc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Goldmansachs Collections")+theme_bw() 


ggsave("D:\\jmh\\gc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1)) 
g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggsave("D:\\jmh\\ggplotgc.png", width=6, height=6, dpi=100)

Suggested by the R user forum to improve the aesthetics of the plot. The Confidence Interval of 99% shown in the plots above is not correct. But the curves and error bars are correct.

g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggplot creates these two graphs. So instead of qplot code we should use ggplot.

Update : See this

Filed under Java, R

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

MindSpace

Parsing Java Micro-benchmarking Harness data using dplyr – Part 2

Types of Error bars used to plot the diagram

Suggested by the R user forum to improve the aesthetics of the plot. The Confidence Interval of 99% shown in the plots above is not correct. But the curves and error bars are correct.

Leave a comment Cancel reply

Blogroll