‘R’ code to parse the G1 gc log
April 13, 2015 Leave a comment
I found this blog uses ‘R’ to parse the G1 gc logs. I am specifically using ‘Format 2’ log lines mentioned in the blog. This is one of my favorite techniques to learn ‘R’. The author uses Unix shell scripts to parse the data but I think ‘R’ to an excellent language to parse data. I have done this in the past.
Eventhough my code here is verbose a ‘R’ expert can make short work of the data easily. My code uses regular expressions too many times.
library(stringr) g1 <- read.table("D:\\Python Data Analytics\\single_g1gc_trace.txt",sep="\t") timestamps <- g1[ with(g1, grepl("[0-9]+-[0-9]+-", V1)) , ] timestamp <- str_extract(timestamps,'[^:]*:[^:]*') heapsizes <- g1[ with(g1, grepl("[0-9]+M+", V1)) , ] g1data <- data.frame( timestamp ) g1data$heapsizes <- heapsizes heapsize <- str_extract_all(g1data$heapsizes,'[0-9]+') g1data$BeforeSize <- 0 g1data$AfterSize <- 0 g1data$TotalSize <- 0 g1data$timestamp <- 0 row = NROW(g1data) i = 1 for( size in heapsize ){ if( i <= row ){ g1data[i,3] <- size[1] g1data[i,4] <- size[2] g1data[i,5] <- size[3] i = i + 1 } } i = 1 for( stamp in timestamp ){ if( i <= row ){ g1data[i,1] <- stamp i = i + 1 } } g1data <- subset(g1data, select=c(timestamp,BeforeSize,AfterSize,TotalSize))
timestamp BeforeSize AfterSize TotalSize
1 2014-05-13T08:54 1938 904 9216
2 2014-05-13T08:54 1939 905 9217