‘R’ code to parse the G1 gc log

I found this blog uses ‘R’ to parse the G1 gc logs. I am specifically using ‘Format 2’ log lines mentioned in the blog. This is one of my favorite techniques to learn ‘R’. The author uses Unix shell scripts to parse the data but I think ‘R’ to an excellent language to parse data. I have done this in the past.

Eventhough my code here is verbose a ‘R’ expert can make short work of the data easily. My code uses regular expressions too many times.

library(stringr)

g1 <- read.table("D:\\Python Data Analytics\\single_g1gc_trace.txt",sep="\t")

timestamps <- g1[ with(g1,  grepl("[0-9]+-[0-9]+-", V1)) , ]
timestamp <- str_extract(timestamps,'[^:]*:[^:]*')
heapsizes <- g1[ with(g1,  grepl("[0-9]+M+",  V1)) , ]

g1data <- data.frame( timestamp )
g1data$heapsizes <- heapsizes

heapsize <- str_extract_all(g1data$heapsizes,'[0-9]+')

g1data$BeforeSize <- 0
g1data$AfterSize <- 0
g1data$TotalSize <- 0
g1data$timestamp <- 0

row = NROW(g1data)
i = 1

	for( size in heapsize ){
	  if( i <= row ){
      	g1data[i,3] <- size[1]
	  	g1data[i,4] <- size[2]
	  	g1data[i,5] <- size[3]
	  	i = i + 1
	   }
	}
	i = 1
	for( stamp in timestamp ){
		if( i <= row ){
			g1data[i,1] <- stamp
			i = i + 1
		}
	}
	
g1data <- subset(g1data, select=c(timestamp,BeforeSize,AfterSize,TotalSize))

timestamp BeforeSize AfterSize TotalSize
1 2014-05-13T08:54 1938 904 9216
2 2014-05-13T08:54 1939 905 9217

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: