rJava, rCharts and R code to display GC data

These are the steps I follow to display GC activity data using a nvd3 discrete bar chart.

Call Java class using rJava

  gctypes <- .jcall(realtimegcdataobserver ,"Ljava/util/List;","getGCTypes")

Create an empty data frame to hold the data

I get the type of the GC algorithm, GC count and time from JMX. I have yet to explore the last two values.

gcdata <- function(){
  df <- data.frame(
                 GCType=character(), 
                 Count=character(),
                 Time=character(), 
                 stringsAsFactors=FALSE)
  print(df)
  return(df)
}

Iterate over the list of beans

Call appropriate methods and fill up the empty data frame.
I massage the data using the last two lines but don’t know any elegant way to accomplish this.

  emptygcdata <- gcdata()
  gctypedetails <- sapply( gctypes, function(item) rbind(emptygcdata, as.data.frame(c(GCType=item$getName(),Count=item$getL(),Time=item$getM()))))

  gctypedetails <- data.frame(gctypedetails)
  gctypedetails <- data.frame(matrix(unlist(gctypedetails)))

matrix.unlist.gctypedetails..
1 PS Scavenge
2 16
3 22
4 PS MarkSweep
5 0
6 0


emptygcdata <- gcdata()
  before <- 0
  after <- 2
  repeat
  {
    if (after >= nrow(gctypedetails))
     break;
    emptygcdata <- rbind(emptygcdata, data.frame(GCType =gctypedetails[before + 1,1], Count =gctypedetails[before + 2,1], Time=gctypedetails[before + 3,1]))
    before <- after + 1;
    after <- after + 2;
   }

 GCType          Count  Time
1  PS Scavenge      16  22
2  PS MarkSweep     0  0

nvd3 using rCharts

  p2 = nPlot(x = "Time", y = "Count", data = emptygcdata, type = "discreteBarChart")
  p2$chart(
    color = "#! function(d){
      var ourColorScale = d3.scale.ordinal().domain(['PS MarkSweep','PS Scavenge']).range(['green','purple']);
      return ourColorScale(d.GCType);
    }!#")

rCharts and nvd3

I have been playing with some JavaScript chart libraries including nvd3 . These JS libraries are all wrapped by R code in the form of rCharts.

I have a data frame like this.

Var1 Freq
levelI 13
levelII 13
levelIII 12
levelIV 12

I want to plot a discreet bar chart using nvd3 and each bar should be of a different color depending on Var1. Var1 will have duplicate values.


#The type of <em>Var1</em> is integer in my data frame. So instead of debugging that
#I just convert it to "character" using this line.
bpf3$Var1 <- paste(bpf3$Var1,"",sep="")

#Match the first column with a hard-coded value and append a color
for(i in 1:nrow(bpf3)) {
    print(identical(bpf3[i,1],"levelI"))
    if(identical(bpf3[i,1],"levelI")){
      bpf3[i,1] <- paste(bpf3[i,1],"green",sep=":") 
    }else{
      bpf3[i,1] <- paste(bpf3[i,1],"orange",sep=":") 
    }
}
p2 = nPlot(x = "Var1", y = "Freq", data = bpf3, type = "discreteBarChart")

p2$chart(color = "#! function(d, x){ var color = d.Var1; return color.split(':')[1];} !#")

So it is possible to use a function that assigns a color to a bar in nvd3. rCharts lets you define a function like this.

The result is this. This is not an elegant way though.

Screen Shot 2014-06-26 at 12.14.19 PM

Update :

The author of http://timelyportfolio.blogspot.in/ pointed out that there is a function that can be used like this.


  p2$chart(
    color = "#! function(d){
      var ourColorScale = d3.scale.ordinal().domain(['levelI','levelII']).range(['red','blue']);
      return ourColorScale(d.Event);
    }!#")

It works splendidly.

Real-time graph showing heap generations

I am able to dynamically update a graph in real-time with test data using the code shown in the previous post. The graph is refreshed without refreshing the entire browser. Even though the particular JavaScript graph library will be replaced with another better one, at this time the tests are successful.
Now I will be able to use JMX to get the YoungGen data from the heap and show actual data as the GC collects the garbage.

All the code will be pushed to Github.

Screen Shot 2014-06-23 at 11.10.36 AM

Update some columns of a R data frame

I wanted to code a R function that updates some columns in each row and not the entire row.

I declare a global variable to keep track of which row’s columns are being updated. I want to stop when the last row is reached.


youngdata <- function() {
    ydata <<- 1
}

This function creates a data frame which has 3 columns but valid data in only the first column. The 2nd and 3rd columns will be updated one row at a time.

youngdata <- function() {
  younggen <<- data.frame(lapply(as.Date('2014-08-06') + 0:0, seq, as.Date('2014/10/08'), '1 days'));
  younggen['eden'] <- 0
  younggen['survivor'] <- 0
  colnames(younggen) <- c("date","eden","survivor")
}

> younggen
date eden survivor
1 2014-08-06 0 0
2 2014-08-27 0 0
3 2014-09-17 0 0
4 2014-10-08 0 0


loadeddata <- function(df){
         if( ydata > nrow(df)){
             return
         }
         print("Loading new data")
         newdata <- data.frame(sample(1:40, 1, replace=F), sample(1:40, 1, replace=F),stringsAsFactors=FALSE)
         colnames(newdata) <- c("eden","survivor")    
         df[ydata,which(names(df) %in% names(newdata))]  <- newdata
 return(df)
 }

So every time the function loadeddata is called one row’s columns are updated and it stops when all the rows are finished. I increment ydata in another part of the code not shown here.

> younggen
date eden survivor
1 2014-08-06 2 16
2 2014-08-27 0 0
3 2014-09-17 0 0
4 2014-10-08 0 0

This code is used to test R-shiny’s ReactiveTimer that enables the code to dynamically update a graph with new data.

R Shiny

I have recently started to code a web dashboard to show information like heap usage in HotSpot. So initially I setup a Shiny server. The part that connects to HotSpot is not ready but this is my first Shiny UI. This is a Twitter BootStrap UI.

Part of the shiny server code is this. Now the data is generated by R code and later the data will be extracted from the JVM using JMX and other serviceability API’s.

output$metaspace <- renderChart({
  metacapacity <- data.frame(lapply(as.Date('2014-08-06') + 0:0, seq, as.Date('2014/10/08'), '1 weeks'));
  metacapacity['init'] <- 300
  metacapacity['committed'] <- 700
  colnames(metacapacity) <- c("date","init","committed")
  metacapacity  <- transform(metacapacity,date=as.character(date))
  ms <- mPlot(x = "date", y = c("init", "committed"), type = "Area", data = metacapacity)
  ms$addParams(height = 300, dom = 'metaspace')
  ms$set(title="MetaSpace")
  return(ms)
  })

Porting Python BoxPlot code to “R”

My previous two entries explained how I am attempting to port the Python code used to create the graph in the DZone article to “R”.

The author of that article published the data that he used to generate the graph. The data consists of several files. I have taken one file and tried to create boxplots from it. I will improve this by combining all the files, parsing them and generating a combined boxplot. But first I coded this “R” script to parse one file and generate a boxplot.

The data from one of the files looks like this.


timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,Latency
1346999466187,32,Home page - anon,200,OK,Anonymous Browsing 1-2,text,true,31
1346999466182,37,Login form,200,OK,Node save 3-1,text,true,36
1346999466184,35,Home page - anon,200,OK,Anonymous Browsing 1-11,text,true,32
1346999466182,37,Home page - anon,200,OK,Anonymous Browsing 1-1,text,true,34
1346999466189,30,Home page - anon,200,OK,Anonymous Browsing 1-4,text,true,27
1346999466185,46,Home page - anon,200,OK,Anonymous Browsing 1-5,text,true,34
1346999466185,44,Search,200,OK,Search 4-1,text,true,35
1346999466188,28,Home page - anon,200,OK,Anonymous Browsing 1-3,text,true,26
1346999466182,33,Home page - anon,200,OK,Anonymous Browsing 1-7,text,true,32
1346999466182,36,Login Form,200,OK,Perform Login/View Account 5-1,text,true,35
1346999466182,35,Home page - anon,200,OK,Anonymous Browsing 1-10,text,true,33
1346999466182,34,Login Form,200,OK,Authenticated Browsing 2-1,text,true,32
1346999466184,33,Home page - anon,200,OK,Anonymous Browsing 1-6,text,true,31
1346999466182,37,Home page - anon,200,OK,Anonymous Browsing 1-9,text,true,35

It is very easy to parse this and create a “R” data frame.


# TODO: Box Plots
# 
# Author: radhakrishnan
###############################################################################


library(plyr)

options("scipen"=100, "digits"=4)

data <- read.table("~/Documents/Learn R Statistics/R/jmeter_results/4-overall-summary.csv",sep=",",header=T)

head(data)

#I don't think I need 'ddply' here but it serves the purpose.
#It groups the data based on the 'label' and returns the two relevant columns
data <- ddply( data , .variables = "label" , .fun = function(x) x[,c("label","elapsed")])

uniquelables <- as.character(unique(data$label))
lists <- replicate( length(uniquelables),list())

j = 1


for (i in uniquelables ){
  lists[j] = as.list(as.data.frame(data[data$label %in% i,'elapsed']))
  j = j + 1
}

boxplot(lists)

BoxPlot from one file

boxplot

Eclipse StatET-R environment

My Eclipse R environment is ready.

Screen Shot 2014-01-08 at 1.05.46 PM

Area between curves

This is what I like about ‘R’. This one line is enough to apply a shade of color
to the area between two curves. Apart from the functional programming aspects(
http://adv-r.had.co.nz/), I am interested in its powerful API’s used to visualize and parse data.

polygon( c(data$Time, rev(data$Time)),
         c(as.numeric(data$Used), rev(as.numeric(data$Committed))),
         col = "antiquewhite1",
		 border = NA )

code-cache

Error bars using ‘R’

I believe our measurements are uncertain and we need to show the errors in our capacity measurement plots. I suspect that we are making fundamental mistakes in our attempts to gather performance statistics and drawing graphs. All the more reason for showing these uncertainties. Our management and clients should not be mislead by the lack of skills of our Capacity planners.

This code and the graph are used to learn one aspect of showing such errors. I am yet to investigate the type of errors and their statistical significance.

If there is a mistake I will make corrections to this blog entry.

Updated : Code and graph.

 this.dir <- dirname(parent.frame(2)$ofile) 
setwd(this.dir)
 #Reference values plotted on x-axis. These are constant.
 #These values could be time of day. So every day at the same
 #time we could collect other measurements
 referenceset <- data.frame(c(5,10,15,20,25,30,35,40,50,60))
 colnames( referenceset) <- c("reference")

 #These are the sets of measurements. So every day at the same
 #time we could collect several samples. This is simulated now.
 sampleset <- data.frame( matrix(sample(1:2, c(20000), replace = TRUE), ncol = 2000) )
 
 sampleset <- cbind( sampleset, referenceset )
 
 #Calculate mean
 sampleset$mean <- apply(sampleset[,1:10],2,mean)
 
 #Calculate Standard Deviation
 sampleset$sd <- apply(sampleset[,c(1:10)],2,sd)
 
 #Calculate Standard Error
 sampleset$se <- sampleset$sd / sqrt(10)
 
 #print(sampleset)

	png(
	"errorbars.png",
	width =500, height = 510)
 
 plot( sampleset$reference,
       sampleset$mean,
	   las=1,
	   ylab="Mean of 'y' values",
	   xlab="x",
      ylim=c(0,3),
	  type="l",
	  lwd=1,
	   col="blue"
      );
	  
arrows(sampleset$reference,
       sampleset$mean-sampleset$se,
	   sampleset$reference,
	   sampleset$mean+sampleset$se,
	   code = 3,
	   angle=90,
	   length=0.2)

dev.off()


errorbars

statlearning class

This will be very useful for people like me who want to apply this to Capacity Planning

Rob Tibshirani and I are offering a MOOC in January on Statistical Learning.
This “massive open online course" is free, and is based entirely on our new book
“An Introduction to Statistical Learning with Applications in R”
(James, Witten, Hastie, Tibshirani 2013, Springer). http://www-bcf.usc.edu/~gareth/ISL/
The pdf of the book will also be free.

The course, hosted on Open edX, consists of video lecture segments, quizzes, video R sessions, interviews with famous statisticians,
lecture notes, and more. The course starts on January 22 and runs for 10 weeks.

Please consult the course webpage http://statlearning.class.stanford.edu/ to enroll and for for further details.
----------------------------------------------------------------------------------------
Trevor Hastie hastie@stanford.edu
Professor, Department of Statistics, Stanford University
Phone: (650) 725-2231 Fax: (650) 725-8977
URL: http://www.stanford.edu/~hastie
address: room 104, Department of Statistics, Sequoia Hall
390 Serra Mall, Stanford University, CA 94305-4065
--------------------------------------------------------------------------------------