Aesthetics of ‘R’ Plots

I submitted a fresh pull request to adoptopenjdk to rectify the graphs in parsing-java-micro-benchmarking-harness-data-using-dplyr-part-2 after the lead developer of gs-collections identified that the titles in the two graphs are switched.

It is very important that we do not mislead with data.

g <- ggplot(jc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "lightseagreen",
				linetype = 2, alpha= 0.1) +
		geom_line(color = "lightblue", size = 0.6) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.1,
				color = "darkorange3") +
		labs(x = "Iterations", y = "Goldmansachs collections") + geom_text(aes(label = V1), size=4, family="Times", lineheight=.8)

ggplotgc

ggplotjc

Task duration estimation using betaPERT distribution and Monte Carlo Analysis

I have been researching this seemingly simple topic for many months. After reading many articles and browsing books and asking other experts I have a basic idea about this. Now I also find it ludicrous that our famed project managers in my companies do not seem to know this simple distribution after appearing for the PMP exams. Many managers here care not a coot for such technical items.

According to betaPERT

“The Beta-PERT methodology allows to parametrize a generalized Beta distribution based on expert opinion regarding a pessimistic estimate (minimum value), a most likely estimate (mode), and an optimistic estimate (maximum value).”

I will add more explanations later based on what I understand.


library(ggplot2)

opt <- 50 # Optimistic estimate
lik <- 100 # Likely estimate
pes <- 500 # Pessimistic estimate

lambda <- 4 # PERT weighting for likely

# PERT estimate is then
print( (opt + lambda*lik + pes)/(2 + lambda) )

# Mapping to the beta distribution
s1 <- 1+lambda*(lik-opt)/(pes-opt)
s2 <- 1+lambda*(pes-lik)/(pes-opt)

# Generate 1000 samples from the beta distribution that is scaled to the PERT parameters

persondays <- opt + (pes-opt) * rbeta(1000, s1, s2)

# Look at the output
png("D:/R/persondayssimulation.png")
hist(persondays)
dev.off()

print( summary(persondays))
# Compare to the PERT estimate
print( mean(persondays) )

gg <- ggplot(data.frame(persondays),
		     aes(x = persondays))

gg <- gg + geom_histogram(aes(y = ..density..),
		                  color = "black",
						  fill = "white", 
		                  binwidth = 15)
				  
gg <- gg + geom_density(fill = "mediumvioletred",
		                alpha = 0.5)
gg <- gg + theme_bw() 

ggsave("D:\\R\\density.png",
		width=6,
		height=6,
		dpi=100)

density

The book I was asked to read to understand Monte Carlo analysis is Introducing Monte Carlo Methods with R

Credit should go to
Mango
for sending me a private mail about this.

Parsing Java Micro-benchmarking Harness data using dplyr – Part 2

I have to add explanations later because I have to determine if the statistical measures calculated are correct or wrong. But this is based on the previous blog post.

Update : I think the measures are correctly plotted.

Types of Error bars used to plot the diagram

Error bars Type Description
Standard error (SEM) Inferential A measure of how variable the mean will be, if you repeat the whole study many times.
Confidence interval (CI), usually 95% CI Inferential A range of values you can be 95% confident contains the true mean.

The parsing will not work if JMH changes the default format of the output file.

library(stringr)
library(dplyr)
library(ggplot2)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
	    select(V1) %>%	
		filter(grepl("^Iteration", V1)) %>%  
        mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

final <- mutate(final,IDX = 1:n())

jc <- final %>%
		filter(IDX < 21)


gc <- final %>%
		filter(IDX > 20)

gc <- mutate(gc,IDX = 1:n())

jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x))))
gc <- data.frame(sapply(gc, function(x) as.numeric(as.character(x))))


print(summary(jc$V1))
error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1))
error1 <- mean(jc$V1)-error
error2 <- mean(jc$V1)+error

q <- qplot(geom = "line",jc$IDX,jc$V1, colour='red')+geom_errorbar(aes(x=jc$IDX, ymin=jc$V1-sd(jc$V1), ymax=jc$V1+sd(jc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=jc$IDX, y=jc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Java Collections")+theme_bw() 

ggsave("D:\\jmh\\jc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1)) 
g <- ggplot(jc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Java collections")

ggsave("D:\\jmh\\ggplotjc.png", width=6, height=6, dpi=100)


print(summary(gc$V1))
error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1))
error1 <- mean(gc$V1)-error
error2 <- mean(gc$V1)+error

q1 <- qplot(geom = "line",gc$IDX,gc$V1, colour='red')+geom_errorbar(aes(x=gc$IDX, ymin=gc$V1-sd(gc$V1), ymax=gc$V1+sd(gc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=gc$IDX, y=gc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Goldmansachs Collections")+theme_bw() 


ggsave("D:\\jmh\\gc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1)) 
g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggsave("D:\\jmh\\ggplotgc.png", width=6, height=6, dpi=100)

jc

gc

Suggested by the R user forum to improve the aesthetics of the plot. The Confidence Interval of 99% shown in the plots above is not correct. But the curves and error bars are correct.

g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggplot creates these two graphs. So instead of qplot code we should use ggplot.

ggplotjc

ggplotgc

Update : See this

Parsing Java Micro-benchmarking Harness data using dplyr – Part 1

This is about the venerable JMH and Hadley Wickham’s dplyr and pipes package. dplyr enables you to have too much fun with data. Its pipes are so powerful and makes short shrift of even messy data.

# VM invoker: D:\Java\bin\java.exe
# VM options: -XX:-TieredCompilation -Dbenchmark.n=10000
# Warmup: 5 iterations, 50 ms each
# Measurement: 20 iterations, 50 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: oracle.stream.javaone.CollectionComparison.goldmansachscollections

# Run progress: 0.00% complete, ETA 00:00:02
# Fork: 1 of 1
# Warmup Iteration 1: 0.443 us/op
# Warmup Iteration 2: 0.290 us/op
# Warmup Iteration 3: 0.343 us/op
# Warmup Iteration 4: 0.350 us/op
# Warmup Iteration 5: 0.388 us/op
Iteration 1: 0.796 us/op
Iteration 2: 0.542 us/op
Iteration 3: 0.510 us/op
Iteration 4: 0.617 us/op
Iteration 5: 0.482 us/op
Iteration 6: 0.387 us/op
Iteration 7: 0.272 us/op
Iteration 8: 0.536 us/op
Iteration 9: 0.498 us/op
Iteration 10: 0.402 us/op
Iteration 11: 0.328 us/op
Iteration 12: 0.542 us/op
Iteration 13: 0.299 us/op
Iteration 14: 0.647 us/op
Iteration 15: 0.291 us/op
Iteration 16: 0.815 us/op
Iteration 17: 0.680 us/op
Iteration 18: 0.363 us/op
Iteration 19: 0.560 us/op
Iteration 20: 0.334 us/op

Result: 0.495 ¦(99.9%) 0.140 us/op [Average]
Statistics: (min, avg, max) = (0.272, 0.495, 0.815), stdev = 0.162
Confidence interval (99.9%): [0.355, 0.636]

# VM invoker: D:\Java\bin\java.exe
# VM options: -XX:-TieredCompilation -Dbenchmark.n=10000
# Warmup: 5 iterations, 50 ms each
# Measurement: 20 iterations, 50 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: oracle.stream.javaone.CollectionComparison.javacollections

# Run progress: 50.00% complete, ETA 00:00:05
# Fork: 1 of 1
# Warmup Iteration 1: 0.475 us/op
# Warmup Iteration 2: 0.696 us/op
# Warmup Iteration 3: 0.816 us/op
# Warmup Iteration 4: 0.622 us/op
# Warmup Iteration 5: 0.574 us/op
Iteration 1: 0.987 us/op
Iteration 2: 0.585 us/op
Iteration 3: 0.770 us/op
Iteration 4: 0.711 us/op
Iteration 5: 0.546 us/op
Iteration 6: 0.553 us/op
Iteration 7: 1.164 us/op
Iteration 8: 1.096 us/op
Iteration 9: 1.477 us/op
Iteration 10: 0.824 us/op
Iteration 11: 1.002 us/op
Iteration 12: 0.504 us/op
Iteration 13: 1.019 us/op
Iteration 14: 0.834 us/op
Iteration 15: 0.589 us/op
Iteration 16: 0.557 us/op
Iteration 17: 1.338 us/op
Iteration 18: 0.906 us/op
Iteration 19: 0.486 us/op
Iteration 20: 0.587 us/op

Result: 0.827 ¦(99.9%) 0.252 us/op [Average]
Statistics: (min, avg, max) = (0.486, 0.827, 1.477), stdev = 0.291
Confidence interval (99.9%): [0.574, 1.079]

# Run complete. Total time: 00:00:10

Benchmark Mode Samples Score Scor
e error Units
o.s.j.CollectionComparison.goldmansachscollections avgt 20 0.495
0.140 us/op
o.s.j.CollectionComparison.javacollections avgt 20 0.827
0.252 us/op

library(stringr)
library(dplyr)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
	    select(V1) %>%	
		filter(grepl("^Iteration", V1)) %>%  
        mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

print(final)

V1
1 0.796
2 0.542
3 0.510
4 0.617
5 0.482
6 0.387
7 0.272
8 0.536
9 0.498
10 0.402
11 0.328
12 0.542
13 0.299
14 0.647
15 0.291
16 0.815
17 0.680
18 0.363
19 0.560
20 0.334
21 0.987
22 0.585
23 0.770
24 0.711
25 0.546
26 0.553
27 1.164
28 1.096
29 1.477
30 0.824
31 1.002
32 0.504
33 1.019
34 0.834
35 0.589
36 0.557
37 1.338
38 0.906
39 0.486
40 0.587

PlantUML

There was a time when I was studiously reading UML 2 articles like this one about the UML 2 Composition Model.

This also reminds me of The Journal of Object Technology.

Its mission statement is this.

The Journal of Object Technology (JOT) is a peer-reviewed, free and open-access journal dedicated to the timely publication of previously unpublished research articles, surveys, tutorials, and technical notes on all aspects of object technology.

But I have never been able to draw the kind of UML 2 diagrams that Conrad Bock describes in his articles in the JOT. Visual Paradigm for UML is the tool that I think is the most flexible and I have used both the Community Edition as well as the licensed version. StarUML is good too but not as versatile as Visual Paradigm.
Later on I started using TikZ and PGF and Graphviz and liked them very much.
UML was passe but I still use it.

Recently I started using PlantUML for a code review project and it was like a whiff of fresh air. It uses Graphviz and there are IDE plugins. Everything was easy to learn and use. I could open IntelliJ IDEA and browse the code and also type the PlantUML Domain-specific language code and draw a complex UML 2 diagram.

It is easy for the developers to design their code.I wish they paid heed to someone.

Composition Model

@startuml

skinparam class {
	BackgroundColor WHite
	ArrowColor SeaGreen
	BorderColor Black
}
class "<u>MyCar : Car"{
  +myMethods()
}

class "<u>YourBoat : Boat"{
  +myMethods()
}

class "<u>E2 : Engine"{
	String name
}

class "<u>P1 : Propeller"{
	String name
}

class "<u>E1 : Engine"{
	String name
}

class "<u>W1 : Wheel"{
	String name
}

class "<u>W2 : Wheel"{
	String name
}

class "<u>W3 : Wheel"{
	String name
}

class "<u>W4 : Wheel"{
	String name
}
"<u>E1 : Engine" -- "front"  "<u>W1 : Wheel" : "powers"
"<u>E1 : Engine" -- "front" "<u>W2 : Wheel" : "powers"
"<u>E2 : Engine" --  "<u>P1 : Propeller" : "powers"
"<u>YourBoat : Boat" "inBoat" *-- "p" "<u>P1 : Propeller"
"<u>YourBoat : Boat" "inBoat" *-- "e" "<u>E2 : Engine"
"<u>MyCar : Car" "inCarAsBack" *--  "<u>W1 : Wheel"
"<u>MyCar : Car" "inCarAsBack" *--  "<u>W2 : Wheel"
"<u>MyCar : Car" "inCarAsBack" -right--- "back" "<u>W3 : Wheel"
"<u>MyCar : Car" "inCarAsBack" -right--- "back" "<u>W4 : Wheel"
"<u>MyCar : Car" "inCar" *-- "e" "<u>E1 : Engine"

hide members
hide  circle

@enduml

Update 1:

The flow final activity is in the UML 2 spec. If it is not desired to abort all flows in the activity, use flow final instead. It is a circle with a cross inside. But I didn’t see that in the PlantUML doc.

How can I create it ?

Flow final

The PlantUML team responded.

Well, it’s not possible today.

However, we propose that we have now 2 keywords :

stop (that will display the actual circle)
end (that will display a circle with a cross inside)

Update 2:

What’s New ?

7 June, 2015: Add end keyword in Activity Diagram Beta. (Thanks to Radhakrishnan Mohan for the suggestion).

Is this ok for you, or do you prefer to stay anonymous ?

PS : You can test the beta version here https://dl.dropboxusercontent.com/u/13064071/plantuml.jar

How to install Octave in Mac OS Yosemite ?

I was trying to execute my Octave scripts to recognize digits using a Neural Network. Octave did not work after I upgraded to Yosemite. Now it works after these steps. It may look simple but this wasted several hours.

brew tap homebrew/science

sudo chmod -R 777 /usr/local/share

brew link –overwrite xz

brew install gcc

I thought this prevented the gcc installation from completing.

==> ../configure –build=x86_64-apple-darwin14.3.0 –prefix=/usr/local/Cellar/gcc/5.1.0 –li
Error: Permission denied – /Users/radhakrishnan/Library/Logs/Homebrew/gcc

So I changed the permissions and tried again.

sudo chmod 777 /Users/radhakrishnan/Library/Logs/Homebrew

brew install gcc

sudo chmod -R 777 /usr/local/include/freetype2

brew link freetype

brew install octave –with-x11

Java byte code in practice

I am listening to Rafael on Virtual JUG

Screen Shot 2015-05-20 at 10.27.03 am

Deep learning course at the University of Oxford.: 2014-2015 and another MIT book

ML

I am viewing these Course materials with a feeling of awe. I hope these resources provide some fodder for this blog and my imagination.

One more.

An MIT Press book in preparation

Yoshua Bengio, Ian Goodfellow and Aaron Courville

DEEP LEARNING

One more.

The wonderful resources by Andrej Karpathy

‘mvn package’ through our debilitating NTLM proxy

I was morose, grief-stricken and close to tears when our evil corporate proxy stopped me from doing anything. Each tool needs a different type of parameters to pass through this proxy. I tried to use cntlm but that did not help. After many hours I realized that this settings.xml builds everything properly.

There are two proxy sections but I have not attempted to remove one. It is working as it is.

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                          http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <activeProfiles>
    <!--make the profile active all the time -->
    <activeProfile>securecentral</activeProfile>
  </activeProfiles>
  <profiles>
    <profile>
      <id>securecentral</id>
      <!--Override the repository (and pluginRepository) "central" from the
         Maven Super POM -->
      <repositories>
        <repository>
          <id>central</id>
          <url>http://repo1.maven.org/maven2</url>
          <releases>
            <enabled>true</enabled>
          </releases>
        </repository>
      </repositories>
      <pluginRepositories>
        <pluginRepository>
          <id>central</id>
          <url>http://repo1.maven.org/maven2</url>
          <releases>
            <enabled>true</enabled>
          </releases>
        </pluginRepository>
      </pluginRepositories>
    </profile>
  </profiles>
  <pluginRepositories>
    <pluginRepository>
      <id>central</id>
      <name>Maven Plugin Repository</name>
      <url>http://repo1.maven.org/maven2</url>
      <layout>default</layout>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <updatePolicy>never</updatePolicy>
      </releases>
    </pluginRepository>
  </pluginRepositories>
  <proxies>
    <proxy>
      <id>example-proxy</id>
      <active>true</active>
      <protocol>http</protocol>
      <host>proxy</host>
      <port>port</port>
      <username>user</username>
      <password>password</password>
      <nonProxyHosts>www.google.com|*.example.com</nonProxyHosts>
    </proxy>
    <proxy>
      <id>example-proxy</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>proxy</host>
      <port>port</port>
      <username>user</username>
      <password>password</password>
      <nonProxyHosts>www.google.com|*.example.com</nonProxyHosts>
    </proxy>
  </proxies>
</settings>

JDK9 REPL

I have been building and quickly exploring various JDK 9 features during this past weekend.
There is a new REPL now among other gems. I will update this post as I explore it further.

Mohans-MacBook-Pro:openjdk radhakrishnan$ java -version
java version “1.9.0-ea”
Java(TM) SE Runtime Environment (build 1.9.0-ea-b61)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b61, mixed mode)
Mohans-MacBook-Pro:openjdk radhakrishnan$ java -jar kulla-0.508-20150510054454.jar
| Welcome to JShell — Version 0.508
| Type /help for help

->

-> String s = “__mainn__”.replaceAll(“[^a-z\\s]”, “”);
| Added variable s of type String with initial value “mainn”

-> System.out.println(s);

-> mainn

-> final Map count = s.chars().map(Character::toLowerCase).collect(TreeMap::new, (m, c) -> m.merge((char) c, 1, Integer::sum), Map::putAll);
| Warning:
| Modifier ‘final’ not permitted in top-level declarations, ignored
| final Map count = s.chars().map(Character::toLowerCase).collect(TreeMap::new, (m, c) -> m.merge((char) c, 1, Integer::sum), Map::putAll);
| ^—^
| Added variable count of type Map with initial value {a=1, i=1, m=1, n=2}

-> int x = 26;
| Added variable x of type int with initial value 26

-> count.entrySet().stream().sorted((l, r) -> r.getValue().compareTo(l.getValue())).forEach(e -> count.merge(e.getKey(), x–, Math::multiplyExact));

-> System.out.println(count.entrySet().stream());
java.util.
-> stream.ReferencePipeline$Head@548a9f61

-> System.out.println(count.entrySet().stream().mapToDouble(e -> e.getValue()).sum());
124.0

Everything can be changed in the REPL. Nothing is final and it is ignored. That is what kulla-dev@openjdk.java.net told me.