## Graph of ‘Time and Hits vs MBytes’ using ‘R’

I am yet to crack the capacity planning method using statistical analysis but I am on my way to that exalted state of expertise.

### Time and Hits vs MBytes


this.dir <- dirname(parent.frame(2)$ofile) setwd(this.dir) png( "daily traffic-hits-bytes.png", width =1200, height = 868) par(mar=c(7, 9, 6, 8)) data = input <- readLines(textConnection(" 07 22,864 27,574 49.79 MB 08 225,337 252,366 303.75 MB 09 306,007 318,463 256.01 MB 10 322,103 331,298 252.89 MB 11 314,393 322,646 241.72 MB 12 302,438 309,236 228.91 MB 13 283,773 290,291 217.17 MB 14 299,020 306,326 241.34 MB 15 299,792 307,692 241.76 MB 16 227,502 232,658 188.31 MB 17 106,205 108,890 107.65 MB 18 26,589 27,667 28.13 MB 19 5,952 6,143 11.72 MB 20 2,212 2,213 2.49 MB 21 937 944 1.33 MB")) data <- read.table(text=data,sep=" ",header=FALSE,stringsAsFactors=FALSE,fill=TRUE) data <- data.frame(data) colnames(data) <- c("Hours", "Pages", "Hits", "Bandwidth") plot(data$Hours,as.numeric(gsub(",","", data$Bandwidth)),col="green",pch=16,type="b", ylab="Hits", xlab="",las=2,lwd=2.5,xaxt="n", cex.lab=2,cex.axis=1.6) axis(1,at = seq(min(data$Hours), max(data$Hours), by=1),las = 2,cex.axis=2) title("Time and Hits vs MBytes",cex.main=3,xlab="Time(Hours)", cex.lab=2,1) par(new=TRUE) plot(data$Hours,as.numeric(gsub(",","",data$Pages)),yaxt="n",xlim=c(min(data$Hours),max(data$Hours)),col="blue",cex.axis=2.0,col.axis="blue",pch=6,type="b",xaxt="n",ylab="",xlab="",las=2,lwd=2.5, cex.lab=2) mtext("Mbytes",side=4,line=5) axis(4,at = seq(0, max(as.numeric(gsub(",","",data$Pages))), by=30000),las = 2,cex.axis=1.5)
legend("topleft", lty=c(1,1),lwd=c(3.5,3.5),c("Hits","Mbytes"), fill=c("green","blue"))
#box()

dev.off()


## ‘R’ code to understand number of concurrent connections

Proto Recv-Q Send-Q Local-Address               Foreign-Address             State
tcp        0      0 112.30.110.2:80             101.13.4.18:3037              SYN_RECV
tcp        0      0 112.30.110.2:80             101.13.4.18:3036              SYN_RECV
tcp        0      0 112.30.110.2:80             101.15.1.6:2431               SYN_RECV
tcp        0      0 112.30.110.2:80             101.16.32.19:1689             SYN_RECV
tcp        0      0 112.30.110.2:80             101.13.4.18:3035              SYN_RECV
tcp        0      0 ::ffff:172.210.100.12:80      ::ffff:101.11.5.7:3185        TIME_WAIT
tcp        0      0 ::ffff:172.210.100.12:80      ::ffff:101.15.1.3:3189        TIME_WAIT
tcp        0      0 ::ffff:172.210.100.12:80      ::ffff:101.15.5.7:3445        TIME_WAIT
tcp        0      0 ::ffff:172.210.100.12:80      ::ffff:101.13.29.13:2671       TIME_WAIT


I wanted to draw a graph showing the number of connections made by an IP(Foreign-Address) to our HTTP servers. This is a rough estimate of the concurrent connections. I know I have been smitten by ‘R’ but the power of this language is breathtaking.

I haven’t yet mastered the functional aspect of it but even so code like this one line to find the
frequency of values is really powerful. This line finds the count of the same IP’s(group) in the ‘Foreign-Address’ column in the data frame.

data <-as.data.frame(table(data$Foreign.Address))  The source code works perfectly but some of the parameters used to plot the graphs may not be needed. They don’t seem to interfere in any way but it is hard to understand all the parameters of the plotting API.  library(stringr) this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)

par(mar=c(4,4,4,4))

data[,var] <- sapply(data[,var],function(x) ifelse(length(unlist(str_split(x,":")))==5,unlist(str_split(x,":"))[4],unlist(str_split(x,":"))[1]))
data[,var] <- sapply(data[,var],function(x) ifelse(length(unlist(str_split(x,":")))==5,paste(unlist(str_split(x,":"))[4],":",unlist(str_split(x,":"))[5]),
paste(unlist(str_split(x,":"))[1],":",unlist(str_split(x,":"))[2])))
completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}

data <- data[data$Proto == "tcp" ,] data <-as.data.frame(table(data$Foreign.Address))
#data <- data[with(data, order(-Freq)), ]
data <-data.frame(data, stringsAsFactors = FALSE)

png(file="concurrencyhousers.png",width=700,height=250,res=72)

#User type 1
subsetdata <- subset(data, grepl("^172", Var1) & !grepl("^172.20.100.8", Var1) & !grepl("^172.20.100.9", Var1) )

plot(as.numeric(subsetdata$Var1),subsetdata$Freq,ylim=c(0,700),col="green",type="o",ylab="Number of connections",las=2,lwd=2.5,xaxt="n",xlab="IP")

axis(1,at=subsetdata$Var1,labels=subsetdata$Var1,las = 2,cex.axis=0.9,las=2)
title("Estimation of concurrent connections(HO Users-Sep,2013)",cex.main=1.3,line=1)
legend("topleft", lty=c(1,1),lwd=c(3.5,3.5),c(paste("Users",sep="--",length(subsetdata$Var1)))) #sink("parsedconcurrency.log") #print(data) #sink() dev.off() png(file="concurrencybranchusers.png",width=1300,height=450,res=72) #User type 1 subsetdata <- subset(data, !grepl("^172", Var1) & !grepl("^127", Var1) ) plot(as.numeric(subsetdata$Var1),subsetdata$Freq,ylim=c(0,700),col="green",type="o",ylab="Number of connections",las=2,lwd=2.5,xaxt="n",xlab="IP") axis(1,at=subsetdata$Var1,labels=subsetdata$Var1,las = 2,cex.axis=0.9,las=2) title("Estimation of concurrent connections(Branch Users-Sep,2013)",cex.main=1.8,line=1) legend("topleft", lty=c(1,1),lwd=c(3.5,3.5),c(paste("Users",sep="--",length(subsetdata$Var1))))

#sink("parsedconcurrency.log")
#print(data)
#sink()
dev.off()



## GC analysis using ‘R’


library(stringr)

this.dir <- dirname(parent.frame(2)$ofile) setwd(this.dir) input <- readLines(textConnection(" 2013-08-24T16:03:28.768+0530: 3.928: [Full GC (System) [PSYoungGen: 3600K->0K(382272K)] [PSOldGen: 0K->3285K(873856K)] 3600K->3285K(1256128K) [PSPermGen: 12862K->12862K(25984K)], 0.0541900 secs] [Times: user=0.06 sys=0.01, real=0.05 secs] 2013-08-24T16:03:41.095+0530: 16.254: [Full GC (System) [PSYoungGen: 27968K->0K(382272K)] [PSOldGen: 3285K->30483K(873856K)] 31253K->30483K(1256128K) [PSPermGen: 39954K->39954K(70912K)], 0.2661260 secs] [Times: user=0.53 sys=0.01, real=0.27 secs] 2013-08-24T17:03:41.377+0530: 3616.537: [Full GC (System) [PSYoungGen: 2600K->0K(337152K)] [PSOldGen: 299782K->301931K(873856K)] 302382K->301931K(1211008K) [PSPermGen: 160659K->158703K(285888K)], 1.2217540 secs] [Times: user=1.21 sys=0.01, real=1.22 secs] 2013-08-24T18:03:43.156+0530: 7218.315: [Full GC (System) [PSYoungGen: 192K->0K(337600K)] [PSOldGen: 301931K->271735K(873856K)] 302123K->271735K(1211456K) [PSPermGen: 158716K->126555K(251904K)], 1.5164700 secs] [Times: user=1.52 sys=0.00, real=1.51 secs] 2013-08-24T19:03:45.225+0530: 10820.385: [Full GC (System) [PSYoungGen: 288K->0K(338944K)] [PSOldGen: 271735K->271737K(873856K)] 272023K->271737K(1212800K) [PSPermGen: 126568K->126568K(246272K)], 1.2740980 secs] [Times: user=1.28 sys=0.00, real=1.27 secs] 2013-08-24T20:03:47.052+0530: 14422.212: [Full GC (System) [PSYoungGen: 384K->0K(337792K)] [PSOldGen: 271737K->271740K(873856K)] 272121K->271740K(1211648K) [PSPermGen: 126588K->126588K(237952K)], 1.1500680 secs] [Times: user=1.15 sys=0.00, real=1.15 secs] 2013-08-24T21:03:48.755+0530: 18023.915: [Full GC (System) [PSYoungGen: 288K->0K(343424K)] [PSOldGen: 271740K->271742K(873856K)] 272028K->271742K(1217280K) [PSPermGen: 126597K->126597K(228096K)], 1.2179020 secs] [Times: user=1.22 sys=0.00, real=1.22 secs] 2013-08-24T22:03:50.526+0530: 21625.686: [Full GC (System) [PSYoungGen: 352K->0K(340928K)] [PSOldGen: 271742K->271735K(873856K)] 272094K->271735K(1214784K) [PSPermGen: 126606K->126573K(217152K)], 1.2736310 secs] [Times: user=1.27 sys=0.00, real=1.27 secs] 2013-08-24T23:03:52.353+0530: 25227.513: [Full GC (System) [PSYoungGen: 352K->0K(349696K)] [PSOldGen: 271735K->271737K(873856K)] 272087K->271737K(1223552K) [PSPermGen: 126584K->126584K(206912K)], 1.2725880 secs] [Times: user=1.27 sys=0.00, real=1.27 secs] 2013-08-25T00:03:54.181+0530: 28829.341: [Full GC (System) [PSYoungGen: 1244K->0K(366528K)] [PSOldGen: 286583K->287548K(873856K)] 287827K->287548K(1240384K) [PSPermGen: 152380K->152380K(227904K)], 1.2271970 secs] [Times: user=1.22 sys=0.00, real=1.22 secs] 2013-08-25T01:03:55.963+0530: 32431.123: [Full GC (System) [PSYoungGen: 256K->0K(363264K)] [PSOldGen: 287548K->287550K(873856K)] 287804K->287550K(1237120K) [PSPermGen: 152384K->152384K(220544K)], 1.2622580 secs] [Times: user=1.26 sys=0.00, real=1.26 secs] 2013-08-25T02:03:57.779+0530: 36032.939: [Full GC (System) [PSYoungGen: 320K->0K(372864K)] [PSOldGen: 287550K->284392K(873856K)] 287870K->284392K(1246720K) [PSPermGen: 152392K->151729K(212352K)], 1.5651720 secs] [Times: user=1.57 sys=0.00, real=1.57 secs] 2013-08-25T03:03:59.898+0530: 39635.058: [Full GC (System) [PSYoungGen: 256K->0K(369664K)] [PSOldGen: 284392K->284394K(873856K)] 284648K->284394K(1243520K) [PSPermGen: 151738K->151738K(205056K)], 1.3093640 secs] [Times: user=1.30 sys=0.00, real=1.31 secs] 2013-08-25T04:04:01.762+0530: 43236.922: [Full GC (System) [PSYoungGen: 416K->0K(379328K)] [PSOldGen: 284394K->284397K(873856K)] 284810K->284397K(1253184K) [PSPermGen: 151758K->151758K(198336K)], 1.2442820 secs] [Times: user=1.24 sys=0.00, real=1.24 secs] 2013-08-25T05:04:03.562+0530: 46838.721: [Full GC (System) [PSYoungGen: 224K->0K(376064K)] [PSOldGen: 284397K->284400K(873856K)] 284621K->284400K(1249920K) [PSPermGen: 151766K->151766K(192128K)], 1.2820830 secs] [Times: user=1.29 sys=0.00, real=1.28 secs] 2013-08-25T06:04:05.398+0530: 50440.558: [Full GC (System) [PSYoungGen: 288K->0K(385664K)] [PSOldGen: 284400K->284357K(873856K)] 284688K->284357K(1259520K) [PSPermGen: 151774K->151735K(186816K)], 1.5174430 secs] [Times: user=1.51 sys=0.00, real=1.52 secs] 2013-08-25T07:04:07.470+0530: 54042.629: [Full GC (System) [PSYoungGen: 352K->0K(382528K)] [PSOldGen: 284357K->284359K(873856K)] 284709K->284359K(1256384K) [PSPermGen: 151743K->151743K(181632K)], 1.2664900 secs] [Times: user=1.27 sys=0.00, real=1.27 secs] 2013-08-25T08:04:09.289+0530: 57644.449: [Full GC (System) [PSYoungGen: 384K->0K(391616K)] [PSOldGen: 284359K->284361K(873856K)] 284743K->284361K(1265472K) [PSPermGen: 151751K->151751K(177536K)], 1.2797050 secs] [Times: user=1.28 sys=0.01, real=1.28 secs] 2013-08-25T09:04:11.124+0530: 61246.283: [Full GC (System) [PSYoungGen: 1302K->0K(392960K)] [PSOldGen: 291028K->291762K(873856K)] 292331K->291762K(1266816K) [PSPermGen: 156943K->156943K(179712K)], 1.3176180 secs] [Times: user=1.31 sys=0.01, real=1.32 secs] 2013-08-25T10:04:12.997+0530: 64848.157: [Full GC (System) [PSYoungGen: 608K->0K(402880K)] [PSOldGen: 296941K->290294K(873856K)] 297549K->290294K(1276736K) [PSPermGen: 159091K->158257K(177152K)], 1.6291050 secs] [Times: user=1.62 sys=0.00, real=1.63 secs] 2013-08-25T11:04:15.182+0530: 68450.342: [Full GC (System) [PSYoungGen: 975K->0K(406976K)] [PSOldGen: 294109K->294611K(873856K)] 295085K->294611K(1280832K) [PSPermGen: 159502K->159502K(175936K)], 1.2407930 secs] [Times: user=1.24 sys=0.00, real=1.24 secs] 2013-08-25T12:04:16.984+0530: 72052.144: [Full GC (System) [PSYoungGen: 2065K->0K(410624K)] [PSOldGen: 299586K->300435K(873856K)] 301652K->300435K(1284480K) [PSPermGen: 162453K->162453K(176448K)], 1.2860060 secs] [Times: user=1.28 sys=0.00, real=1.29 secs] 2013-08-25T13:04:18.832+0530: 75653.992: [Full GC (System) [PSYoungGen: 576K->0K(413504K)] [PSOldGen: 303252K->303507K(873856K)] 303828K->303507K(1287360K) [PSPermGen: 162992K->162992K(175360K)], 1.3077730 secs] [Times: user=1.31 sys=0.00, real=1.31 secs] 2013-08-25T14:04:20.700+0530: 79255.859: [Full GC (System) [PSYoungGen: 2432K->0K(412352K)] [PSOldGen: 303507K->288540K(873856K)] 305939K->288540K(1286208K) [PSPermGen: 162994K->160158K(171200K)], 1.4892360 secs] [Times: user=1.49 sys=0.00, real=1.49 secs] 2013-08-25T15:04:22.750+0530: 82857.910: [Full GC (System) [PSYoungGen: 4096K->0K(416640K)] [PSOldGen: 288540K->289267K(873856K)] 292636K->289267K(1290496K) [PSPermGen: 160643K->160643K(169984K)], 1.2662670 secs] [Times: user=1.27 sys=0.01, real=1.27 secs] 2013-08-25T16:04:24.579+0530: 86459.739: [Full GC (System) [PSYoungGen: 4224K->0K(414976K)] [PSOldGen: 289267K->289839K(873856K)] 293491K->289839K(1288832K) [PSPermGen: 161142K->161142K(169472K)], 1.2932650 secs] [Times: user=1.29 sys=0.00, real=1.29 secs] 2013-08-25T17:04:26.436+0530: 90061.596: [Full GC (System) [PSYoungGen: 2688K->0K(419328K)] [PSOldGen: 289839K->289862K(873856K)] 292527K->289862K(1293184K) [PSPermGen: 161150K->161150K(168768K)], 1.3088530 secs] [Times: user=1.31 sys=0.00, real=1.30 secs] 2013-08-25T18:04:28.308+0530: 93663.468: [Full GC (System) [PSYoungGen: 2720K->0K(418176K)] [PSOldGen: 289862K->288528K(873856K)] 292582K->288528K(1292032K) [PSPermGen: 161158K->160169K(166656K)], 1.3218390 secs] [Times: user=1.32 sys=0.00, real=1.32 secs] 2013-08-25T19:04:30.190+0530: 97265.349: [Full GC (System) [PSYoungGen: 2016K->0K(421248K)] [PSOldGen: 288528K->288552K(873856K)] 290544K->288552K(1295104K) [PSPermGen: 160177K->160177K(165568K)], 1.3263620 secs] [Times: user=1.33 sys=0.00, real=1.33 secs] 2013-08-25T20:04:32.076+0530: 100867.235: [Full GC (System) [PSYoungGen: 1984K->0K(420416K)] [PSOldGen: 288552K->288575K(873856K)] 290536K->288575K(1294272K) [PSPermGen: 160185K->160185K(165056K)], 1.3303080 secs] [Times: user=1.33 sys=0.00, real=1.33 secs] 2013-08-25T21:04:33.966+0530: 104469.126: [Full GC (System) [PSYoungGen: 2304K->0K(422976K)] [PSOldGen: 288575K->288598K(873856K)] 290879K->288598K(1296832K) [PSPermGen: 160193K->160193K(164416K)], 1.3543460 secs] [Times: user=1.35 sys=0.00, real=1.35 secs] 2013-08-25T22:04:35.883+0530: 108071.043: [Full GC (System) [PSYoungGen: 2496K->0K(422144K)] [PSOldGen: 288598K->288528K(873856K)] 291094K->288528K(1296000K) [PSPermGen: 160201K->160169K(163840K)], 1.3320060 secs] [Times: user=1.33 sys=0.00, real=1.33 secs] 2013-08-25T23:04:37.778+0530: 111672.937: [Full GC (System) [PSYoungGen: 2688K->0K(424640K)] [PSOldGen: 288528K->288551K(873856K)] 291216K->288551K(1298496K) [PSPermGen: 160177K->160177K(163328K)], 1.3344320 secs] [Times: user=1.33 sys=0.00, real=1.34 secs] 2013-08-26T00:04:39.676+0530: 115274.836: [Full GC (System) [PSYoungGen: 2720K->0K(423872K)] [PSOldGen: 288551K->288575K(873856K)] 291271K->288575K(1297728K) [PSPermGen: 160185K->160185K(163136K)], 1.3371510 secs] [Times: user=1.34 sys=0.00, real=1.34 secs] 2013-08-26T01:04:41.568+0530: 118876.728: [Full GC (System) [PSYoungGen: 384K->0K(425728K)] [PSOldGen: 288575K->288607K(873856K)] 288959K->288607K(1299584K) [PSPermGen: 160193K->160193K(162496K)], 1.3421990 secs] [Times: user=1.34 sys=0.00, real=1.34 secs] 2013-08-26T02:04:43.472+0530: 122478.632: [Full GC (System) [PSYoungGen: 2592K->0K(425408K)] [PSOldGen: 288607K->288529K(873856K)] 291199K->288529K(1299264K) [PSPermGen: 160201K->160169K(161920K)], 1.3468410 secs] [Times: user=1.34 sys=0.00, real=1.35 secs] 2013-08-26T03:04:45.382+0530: 126080.541: [Full GC (System) [PSYoungGen: 2656K->0K(427136K)] [PSOldGen: 288529K->288552K(873856K)] 291185K->288552K(1300992K) [PSPermGen: 160186K->160186K(161920K)], 1.3422060 secs] [Times: user=1.34 sys=0.00, real=1.34 secs] 2013-08-26T04:04:47.286+0530: 129682.446: [Full GC (System) [PSYoungGen: 2688K->0K(426496K)] [PSOldGen: 288552K->288575K(873856K)] 291240K->288575K(1300352K) [PSPermGen: 160194K->160194K(161408K)], 1.3432010 secs] [Times: user=1.35 sys=0.00, real=1.35 secs] 2013-08-26T05:04:49.189+0530: 133284.349: [Full GC (System) [PSYoungGen: 1728K->0K(428224K)] [PSOldGen: 288575K->288599K(873856K)] 290303K->288599K(1302080K) [PSPermGen: 160205K->160205K(161280K)], 1.3453110 secs] [Times: user=1.34 sys=0.00, real=1.34 secs] 2013-08-26T06:04:51.095+0530: 136886.255: [Full GC (System) [PSYoungGen: 2048K->0K(427776K)] [PSOldGen: 288599K->288529K(873856K)] 290647K->288529K(1301632K) [PSPermGen: 160213K->160181K(161344K)], 1.3442980 secs] [Times: user=1.35 sys=0.00, real=1.34 secs] 2013-08-26T07:04:53.001+0530: 140488.161: [Full GC (System) [PSYoungGen: 2368K->0K(429120K)] [PSOldGen: 288529K->288552K(873856K)] 290897K->288552K(1302976K) [PSPermGen: 160189K->160189K(161216K)], 1.3452220 secs] [Times: user=1.34 sys=0.00, real=1.34 secs] 2013-08-26T08:04:54.908+0530: 144090.068: [Full GC (System) [PSYoungGen: 6091K->0K(419584K)] [PSOldGen: 311930K->313962K(873856K)] 318022K->313962K(1293440K) [PSPermGen: 169186K->169186K(176128K)], 1.4025960 secs] [Times: user=1.40 sys=0.00, real=1.40 secs] 2013-08-26T09:04:56.871+0530: 147692.031: [Full GC (System) [PSYoungGen: 4608K->0K(417216K)] [PSOldGen: 721219K->405521K(873856K)] 725828K->405521K(1291072K) [PSPermGen: 200747K->200747K(227200K)], 1.6779290 secs] [Times: user=1.67 sys=0.01, real=1.68 secs] 2013-08-26T10:04:59.107+0530: 151294.266: [Full GC (System) [PSYoungGen: 448K->0K(418880K)] [PSOldGen: 532766K->370092K(873856K)] 533214K->370092K(1292736K) [PSPermGen: 212473K->211292K(236544K)], 2.0952720 secs] [Times: user=2.10 sys=0.00, real=2.09 secs] 2013-08-26T11:05:01.760+0530: 154896.919: [Full GC (System) [PSYoungGen: 2540K->0K(422208K)] [PSOldGen: 539017K->432818K(873856K)] 541558K->432818K(1296064K) [PSPermGen: 216295K->216295K(239296K)], 1.7726100 secs] [Times: user=1.77 sys=0.00, real=1.78 secs] 2013-08-26T12:05:04.092+0530: 158499.252: [Full GC (System) [PSYoungGen: 4662K->0K(421568K)] [PSOldGen: 597561K->444897K(873856K)] 602224K->444897K(1295424K) [PSPermGen: 222015K->222015K(242368K)], 1.8551220 secs] [Times: user=1.85 sys=0.00, real=1.85 secs] 2013-08-26T13:05:06.506+0530: 162101.666: [Full GC (System) [PSYoungGen: 3168K->0K(424256K)] [PSOldGen: 597850K->442066K(873856K)] 601018K->442066K(1298112K) [PSPermGen: 225426K->225426K(244608K)], 1.8385190 secs] [Times: user=1.84 sys=0.00, real=1.83 secs] 2013-08-26T14:05:08.910+0530: 165704.069: [Full GC (System) [PSYoungGen: 3676K->0K(411328K)] [PSOldGen: 656036K->415314K(873856K)] 659712K->415314K(1285184K) [PSPermGen: 227539K->223902K(241856K)], 2.1566410 secs] [Times: user=2.16 sys=0.00, real=2.16 secs] 2013-08-26T15:05:11.624+0530: 169306.784: [Full GC (System) [PSYoungGen: 1409K->0K(419712K)] [PSOldGen: 591866K->456335K(873856K)] 593276K->456335K(1293568K) [PSPermGen: 226173K->226173K(242944K)], 1.9383440 secs] [Times: user=1.94 sys=0.00, real=1.94 secs] 2013-08-26T16:05:14.123+0530: 172909.283: [Full GC (System) [PSYoungGen: 3222K->0K(409216K)] [PSOldGen: 701507K->482170K(873856K)] 704729K->482170K(1283072K) [PSPermGen: 230151K->230151K(244992K)], 1.9809930 secs] [Times: user=1.98 sys=0.01, real=1.98 secs] 2013-08-26T17:05:16.664+0530: 176511.823: [Full GC (System) [PSYoungGen: 4809K->0K(392192K)] [PSOldGen: 689255K->432828K(873856K)] 694065K->432828K(1266048K) [PSPermGen: 234505K->234505K(247616K)], 1.8189030 secs] [Times: user=1.81 sys=0.00, real=1.82 secs] 2013-08-26T18:05:19.040+0530: 180114.200: [Full GC (System) [PSYoungGen: 1152K->0K(406592K)] [PSOldGen: 474135K->339348K(873856K)] 475287K->339348K(1280448K) [PSPermGen: 235014K->227479K(239936K)], 1.9517330 secs] [Times: user=1.95 sys=0.00, real=1.95 secs] 2013-08-26T19:05:21.549+0530: 183716.709: [Full GC (System) [PSYoungGen: 2794K->0K(411648K)] [PSOldGen: 347094K->349430K(873856K)] 349888K->349430K(1285504K) [PSPermGen: 227894K->227894K(239936K)], 1.5431780 secs] [Times: user=1.54 sys=0.00, real=1.55 secs] 2013-08-26T20:05:23.649+0530: 187318.809: [Full GC (System) [PSYoungGen: 480K->0K(415872K)] [PSOldGen: 359933K->360054K(873856K)] 360413K->360054K(1289728K) [PSPermGen: 229702K->229702K(240448K)], 1.6184180 secs] [Times: user=1.61 sys=0.01, real=1.61 secs] 2013-08-26T21:05:25.851+0530: 190921.011: [Full GC (System) [PSYoungGen: 6976K->0K(402240K)] [PSOldGen: 383142K->363210K(873856K)] 390118K->363210K(1276096K) [PSPermGen: 230308K->230308K(239744K)], 1.5142840 secs]")) # keep only the data time <- str_extract(input,"T[^.]*") timeframex<-data.frame(time) timeframey<-subset(timeframex, timeframex[1] != "NA") timeframey<-substring(timeframey$time,2,9)
timeframey <- timeframey[-1]
timeframey <- data.frame(timeframey)

duration <- str_extract(input,"real=(.*)(secs)")
duration <- gsub("[^0-9.]","",duration)
durationx<-data.frame(duration)
durationy<-subset(durationx, durationx[1] != "NA")

colnames(timeframey)<-c("duration")
timevsduration <- data.frame(timeframey,durationy)

par(mar=c(10,4,4,2))
set1 <- timevsduration[1:8,]
set2 <- timevsduration[9:24,]
set3 <- timevsduration[25:22,]

png(file="fullgc1.png",width=1400,height=450,res=72)

mgp=c(3,1,0)
set1$duration<- as.POSIXct(paste('2013-08-24', set1$duration))
plot(set1$duration,as.numeric(levels(set1$duration.1)[set1$duration.1]),type="b",col = "blue", ylim=c(0,max(as.numeric(levels(set1$duration.1)[set1$duration.1]))),ylab="Duration", xaxt = 'n', xlab="Time",las=2,lwd=2.5, lty=1,cex.axis=1.2) axis(1, at = set1$duration, labels = set1$duration, las = 2,cex.axis=1.2) text(set1$duration,as.numeric(levels(set1$duration.1)[set1$duration.1]), as.numeric(levels(set1$duration.1)[set1$duration.1]), 2, cex=1.45)
dev.off()

png(file="fullgc2.png",width=1400,height=350,res=72)

set2$duration<- as.POSIXct(paste('2013-08-25', set2$duration))
plot(set2$duration,as.numeric(levels(set2$duration.1)[set2$duration.1]),type="b",col = "blue", ylab="Duration", xaxt = 'n', xlab="Time",las=2,lwd=2.5, lty=1,cex.axis=1.2) # now plot you times axis(1,at = set2$duration, labels = set2$duration, las = 2,cex.axis=1.2) text(set2$duration,as.numeric(levels(set2$duration.1)[set2$duration.1]), as.numeric(levels(set2$duration.1)[set2$duration.1]), 2, cex=1.45)

dev.off()

png(file="fullgc3.png",width=1400,height=350,res=72)

set3$duration<- as.POSIXct(paste('2013-08-26', set3$duration))
plot(set3$duration,as.numeric(levels(set3$duration.1)[set3$duration.1]),type="b",col = "blue", ylab="Duration", xaxt = 'n', xlab="Time",las=2,lwd=2.5, lty=1,cex.axis=1.2) # now plot you times axis(1, at = set3$duration, labels = set3$duration, las = 2,cex.axis=1.2) text(set3$duration,as.numeric(levels(set3$duration.1)[set3$duration.1]), as.numeric(levels(set3$duration.1)[set3$duration.1]), 2, cex=1.45)

dev.off()


## Plotting ‘time’ values

Data in ‘time’ format seems to need a different type of process to plot properly. This code uses strptime to get a proper graph. I have plotted two sets of data on the same plot. This is not visually appealing but serves the purpose. So here the linux ‘pmap’ ouput, ‘Kbytes’ and ‘RSS’ are both plotted against the same time values.

The y-axis label colours differentiate the graphs from one another.


13:07:05 2691296 1600864 1583804
13:08:45 2691296 1601280 1584220
13:10:25 2691296 1601280 1584220
13:12:05 2691296 1601480 1584420
13:13:45 2691296 1601820 1584760
13:15:25 2691296 1601852 1584792
13:17:05 2691296 1601960 1584900
13:18:45 2691296 1601996 1584936
13:20:25 2691296 1603548 1586488



this.dir <- dirname(parent.frame(2)$ofile) setwd(this.dir) data = read.table("D:\\Log analysis\\pmapdata-node1.1",header=F) colnames(data) <- c("Time","Kbytes","RSS","Dirty Mode") png( "pmapanalysis4705.png", width = 2324, height = 868) par(mar=c(7, 9, 6, 8)) plot(strptime(data$Time,"%H:%M:%S"),data$Kbytes,pch=0,type="b",col = "red", col.axis="red", ylab="", xlab="",las=2,lwd=2.5,cex.axis=1.5) title("",cex.main=3,xlab="Time", line=5.2,ylab="Kbytes", cex.lab=2,1) par(new=T) plot(strptime(data$Time,"%H:%M:%S"),data$RSS,col="blue",cex.axis=1.3,col.axis="blue",pch=6,type="b",xaxt="n",ylab="",xlab="",las=2,lwd=2.5) legend("topleft", lty=c(1,1),lwd=c(3.5,3.5),c("Kbytes","RSS"), fill=c("red","blue")) box() dev.off()  One problem with this graph is that very few time values are plotted on the x-axis. I don’t know the reason for this but when I asked the ‘R’ forum Jim Holtman suggested this code.  this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)

png(
"pmapanalysis4705.png",
width = 2324, height = 868)
par(mar=c(7, 9, 6, 8))

data$tod <- as.POSIXct(paste('2013-08-28', data$Time))
plot(data$tod,data$Kbytes,type="b",col = "blue", ylab="", xaxt = 'n', xlab="",las=2,lwd=2.5, lty=1,cex.axis=1.5)
# now plot you times
axis(1, at = data$tod, labels = data$Time, las = 2)

box()

dev.off()



Update : Suggestion from Jim Lemon to use ‘plotrix’

this.dir <- dirname(parent.frame(2)$ofile) setwd(this.dir) data = read.table("D:\\Log analysis\\pmapdata-node1.1",header=F) colnames(data) <- c("Time","Kbytes","RSS","Dirty Mode") png( "pmapanalysis4705.png", width = 2324, height = 868) par(mar=c(7, 9, 6, 8)) plot(strptime(data$Time,"%H:%M:%S"),data$Kbytes,pch=0, type="b",col="red",col.axis="red", ylab="", xlab="",las=2,lwd=2.5,xaxt="n") library(plotrix) staxlab(at=as.numeric(strptime(data$Time,"%H:%M:%S")),
labels=as.character(data$Time),nlines=3,srt=90) box() dev.off()  ## Further ‘pmap’ analysis One of the linux administrators thought that the JVM is using all the memory in his linux server. So as a means of convincing him otherwise I plotted the ‘RSS’ column from ‘pmap’ output as a time series graph. this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)

rNo<-as.integer(rownames(data))
data<-cbind(data,rNo)

png(
width = 2224, height = 768)
par(mar=c(5, 7, 4, 8) + 0.1)

plot(data$rNo,data$RSS,pch=0,type="b",col="blue",axes=FALSE,  ylab="", xlab="",las=2,lwd=2.5)
title(" PID - 15979 / pmap Data Analysis for 1 hour",cex.main=3,xlab="Seconds", ylab="Resident Size", cex.lab=3,1)

axis(side=1, at=seq(0, max(data$rNo), by=5), cex.axis=1.7) axis(side=2, at=seq(0, max(data$RSS), by=10000), cex.axis=1.7)

box()

dev.off()



## Processing unix ”pmap” output of a JVM

Analysis of the output of

pmap

using R seemed to take forever. The functional programming style did not help much. But the code with comments is below.

I plan to use this code as part of a ‘nmon’ data analyzer. Code will be in github soon.


@ Set the current directory so that the graphs are generated there
this.dir <- dirname(parent.frame(2)$ofile) setwd(this.dir) #Split unix 'pmap' output. This line is like black magic to me at this time. d <- do.call(rbind, lapply(strsplit(readLines("D:\\Log analysis\\Process Maps\\pmap_5900"), "\\s+"), function(fields) c(fields[1:10], paste(fields[-(1:10)], collapse = " ")))) #Now I am assigning some columns names colnames(d) <- c("Address","Kbytes","RSS","Dirty Mode","Mapping","Test6","Test7","Test8","Test9","Test10","Test11") #This section isolates and graphs mainly 'anon' memory sizes #Aggregate and sum by type of memory allocation to get cumulative size for each type Type2 <- setNames(aggregate(as.numeric(d[,"Kbytes"]), by=list(d[,"Test6"]),FUN=sum,na.rm=TRUE),c("AllocationType","Size")) #I don't want this line. The extra braces in the original file have been added to the columns separately Type2 <-subset(Type2, Type2$AllocationType != "[")

#Create data frame and cleanse
x<-data.frame(d[,7],d[,6],d[,2])
y<-subset(x, x[1] != "NA")
z<-data.frame(y[1],y[3])

png(
"anonymous1.jpg",
width     = 6.25,
height    = 3.50,
units     = "in",
res       = 600,
)
par(mar=c(7,4,1,0))

colnames(z) <- c("AllocationType","Size")

# Split into rows of 100 each. This is not generic.
Type1Split<-split(z, sample(rep(1:3, 100)))
size<-data.frame(Type1Split[[1]]$Size) colnames(size)<-c("Size") #Plot the first set barplot(as.numeric(levels(size$Size)[size$Size]),ylim=c(0,2000), names.arg=(paste(Type1Split[[1]]$AllocationType,Type1Split[[1]]$Size,"kb",sep=" ")),space=6, width=2, xlim = c(0, 1500), ylab="Kbytes",cex.names=0.3,las=2) dev.off() png( "anonymous2.png", width = 6.25, height = 3.50, units = "in", res = 600, ) par(mar=c(7,4,1,0)) size<-data.frame(Type1Split[[2]]$Size)
colnames(size)<-c("Size")

#Plot the second set

barplot(as.numeric(levels(size$Size)[size$Size]),ylim=c(0,2000), names.arg=(paste(Type1Split[[2]]$AllocationType,Type1Split[[2]]$Size,"kb",sep=" ")),space=6, width=2, xlim = c(0, 1500), ylab="Kbytes",cex.names=0.3,las=2)

dev.off()

png(
"anonymous3.png",
width     = 7.25,
height    = 3.50,
units     = "in",
res       = 600,
)
par(mar=c(7,4,1,0))

size<-data.frame(Type1Split[[3]]$Size) colnames(size)<-c("Size") #Plot the third set barplot(as.numeric(levels(size$Size)[size$Size]),ylim=c(0,2000), names.arg=(paste(Type1Split[[3]]$AllocationType,Type1Split[[3]]$Size,"kb",sep=" ")),space=5, width=2, xlim = c(0, 1600), ylab="Kbytes",cex.names=0.3,las=2) dev.off() #This section isolates and graphs memory size of files loaded by the JVM set.seed(10) Type2Split<-split(Type2, sample(rep(1:2, nrow(Type2)/2))) png( "pmap-node2.png", width = 6.20, height = 3.50, units = "in", res = 600, ) par(mar=c(7,4,1,0)) barplot(Type2Split[[1]]$Size, width=1,xlim=c(0,820),ylim = c(0, 15000), space=8, ylab="Kbytes",col=c("violet"),names.arg=(paste(Type2Split[[1]]$AllocationType,Type2Split[[1]]$Size,"kb",sep=" ")), cex.names=0.3,las=2)

dev.off()

png(
"pmap-node3.png",
width     = 6.20,
height    = 3.10,
units     = "in",
res       = 800,
)
par(mar=c(8,4,1,0)+0.3)

barplot(Type2Split[[2]]$Size, width=1,xlim=c(0,880),ylim = c(0, 4000), space=8, ylab="Kbytes",col=c("violet"),names.arg=(paste(Type2Split[[2]]$AllocationType,Type2Split[[2]]$Size,"kb",sep=" ")), cex.names=0.3,las=2) dev.off()  ### Graph of part of the JVM footprint showing files and memory sizes ### Blow up of the graph ### Graph of the ‘anon’ and memory sizes which we thought was a problem ### Blow up of the graph ## “R” graph showing permgen utilization ### Sample of Data gathered using jstat P – Permanent space utilization as a percentage of the space’s current capacity  S0 S1 E O P YGC YGCT FGC FGCT GCT 77.25 0.00 84.44 57.48 98.13 1136 39.241 129 244.316 283.558  ### R code this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)
png(file="permgen.png",width=400,height=350,res=72)
barplot(data$P,data$FGCT, space = 1.5, ylim = c(0, 100), ylab="Percentage Utilization", border="blue")
title(main="Server(Permgen Utilization)")
dev.off()


### Graph

Permgen Utilization

## Visualizing algorithm growth rates

I have been singing paeans about ‘R’, the statistics language, in this blog because it helps one understand and code statistical programs which are the key to various calculations in the fields of project management and capacity planning.
I have also been mulling about how I can generate graphs of algorithm growth rates when input sizes vary. This is child’s play when we use ‘R’.

I could reproduce the graph from Jones and Bartlett’s ‘Analysis of Algorithms’ quite easily. Visualizing these growth rates easily like this is helpful.

The code is general. Nothing specific to algorithm analysis. Immensely useful.

png("funcgrowth.png")
curve(x*x/8,xlab="",ylab="",col="blue",ylim=c(0,200),xlim=c(2,38))
par(new=TRUE)
curve(3*x-2,lwd=3,ylim=c(0,200),xlab="",ylab="",xlim=c(2,38),lty="dashed")
par(new=TRUE)
curve(x+10,lty="dashed",lwd=1,ylim=c(0,200),xlab="",ylab="",xlim=c(2,38))
par(new=TRUE)
curve(2*log(x),col="blue",lty="dashed",lwd=1,ylim=c(0,200),xlab="",ylab="",xlim=c(2,38))
legend(20,180, c("x*x/8","3*x-2","x+10","2*log(x)"), lty=c(1,2,2,2), lwd=c(2.5,2.5,1,1),col=c("blue","black","black","blue"))
dev.off()


## Throughput and Response time curves using R

I started using ‘R’, the statistical language, recently but its power is sorely missed in the IT industry especially in the project management and Capacity planning fields. In fact rigorous data quantification and analysis is given the short shrift by us in our software management activities like calculation of schedule variance and trends.

‘R’ in combination with PDQ helps us visualize the throughput and response time data. We should also be able to predict future performance by changing the service demands assuming that a faster disk or CPU is added. But that is a slightly more involved exercise.

This simple script shows response time graphs of multiple devices in the same graph.

The second graph shows throughput curves. We just have to use GetThruput instead of GetResponse.

library(pdq)

# PDQ globals
think<-20

cpu<-"cpunode"
disk1<-"disknode1"
disk2<-"disknode2"
disk3<-"disknode3"

cpudemand<-0.092
disk1demand<-0.079
disk2demand<-0.108
disk3demand<-0.142

# R plot vectors
xc<-0
yc<-0

Init("")

CreateNode(cpu, CEN, FCFS)

Solve(EXACT)

xc[n]<-as.double(n)
}
plot(xc, yc, type="l", ylim=c(0,60), xlim=c(0,450), lwd=1, xlab="Vusers", ylab="seconds",col="violet")

text(370,13,paste("cpu-",as.numeric(cpudemand)))

# R plot vectors
xc1<-0
yc1<-0

Init("")

CreateNode(disk1, CEN, FCFS)

Solve(EXACT)

xc1[n]<-as.double(n)
}
lines(xc1, yc1,lwd=1,col="blue")
text(400,10,paste("Disk 1-",as.numeric(disk1demand)))

# R plot vectors
xc2<-0
yc2<-0

Init("")

CreateNode(disk2, CEN, FCFS)

Solve(EXACT)

xc2[n]<-as.double(n)
}
lines(xc2, yc2,lwd=1,col="green")
text(330,17,paste("Disk 2-",as.numeric(disk2demand)))

# R plot vectors
xc3<-0
yc3<-0

Init("")

CreateNode(disk3, CEN, FCFS)

Solve(EXACT)

xc3[n]<-as.double(n)
}
lines(xc3, yc3,lwd=1,col="red")
text(240,20,paste("Disk 3-",as.numeric(disk3demand)))


## Performance engineering – Simple Bottleneck analysis

I believe the software developers and leads have forgotten the fundamental concepts of Statistics and other indispensable foundational derivations and formulae. One of the key reasons for this is succinctly explained by Cary V. Millsap from Oracle Corporation in his paper “Performance Management : Myths and Facts”.

“The biggest obstacle to accurate capacity planning is actually the difficulty in obtaining usable workload forecasts. Ironically, the difficulty here doesn’t stem from a lack of mathematical sophistication at all, but rather from the inability or unwillingness of a business to commit to a program of performance management that includes testing, workload management, and the collection and analysis of data describing how a system is being used. “

I am learning the ropes here but one field that is very helpful for performance engineering is queuing theory and analysis and the following is a collection of details from various papers. There is also a rather feeble attempt by me to use PDQ for solving a simple queuing problem.

Fundamentals :

T–the length of the observation period;
A–the number of arrivals occurring during
the observation period;
B–the total amount of time during which the system is busy during the observation period (B ≤ T); and
C–the number of completions occurring during the observation period.

Four important derived quantities are
λ = A/T, the arrival rate (jobs/second);
X = C/T, the output rate (jobs/second);
U = B/T, the utilization (fraction of time system is busy); and
S = B/C, the mean service time per completed job.

My feeble attempt to solve a particular problem. The resulting graph seems erroneous at this time but I will try to fix it.

Problem statement from pages 241-242 in “The Operational Analysis of Queueing Network Models* by PETER J. DENNING and JEFFREY P. BUZEN (Computing Surveys, Vol. 10, No. 3, September 1978)

using pdq and get the graph

based on the system diagram

The diagram is smudged. The think time, Z is 20 seconds and S1 = 05 seconds, S2 = 08 seconds and S3 = 04 seconds.

The visit ratios( mean number of requests per job for a device ) shown in the diagram and given in the paper are

V0 = 1 = .05V1
Vl = V0 + V2 + V3
v2 = .55V1
V3 = .40V1

Solving these equations we get these values. The paper shows the result but it is trivial to substitute and derive the results. So I have done that.

v1 = .05(v2/.55) + .55(v2/.55) + .40(v2/.55)

v1 = v2/.55

v1 = ( v2 * 100 ) /55

55v1 = 100v2

11v1 = 20 v2

v1/ v2 = 20/11

v3 = v1 – v0 – v2

v3 = .40 * 20 v2 / 11

11v3 = 8v2

v2/v3 = 8/11

V1 * S1 = (20)(.05) = 1.00 seconds (Total CPU time is the bottleneck )

V2 * S2 = (11)(.08) = .88 seconds (Total Disk time)

V3 * S3 = (8)(.04)= .32 seconds (Total Drum time)

These products sum to the minimal response time of 2.2 seconds

V1 * S1 = (20)(.05) = 1.00 seconds (Total CPU time is the bottleneck )

V2 * S2 = (11)(.08) = .88 seconds (Total Disk time)

V3 * S3 = (8)(.04)= .32 seconds (Total Drum time)

These products sum to the minimal response time of 2.2 seconds

The number of terminals required to begin saturating the entire system is
Ml* = 22.2

My feeble attempt to use PDQ based on many resources.

# Bottleneck analysis .

library(pdq)

# PDQ globals
think<-20

cpu<-"cpunode"
disk<-"disknode"
drum<-"drumnode"

cpustime<-1.00
diskstime<-.88
drumstime<-.32
dstime<-15

# R plot vectors
xc<-0
yc<-0

Init("")

CreateNode(cpu, CEN, FCFS)

CreateNode(disk, CEN, FCFS)

CreateNode(drum, CEN, FCFS)

Solve(EXACT)

xc[n]<-as.double(n)
}

plot(xc, yc, type="l", ylim=c(0,50), xlim=c(0,50), lwd=2, xlab="M", ylab="seconds")
abline(a=-think, b=cpustime, lty="dashed",  col="red")
abline( 2.2, 0, lty="dashed",  col = "red")
text(18, par("usr")[3]+2, paste("M1=", as.numeric(M1)))

# This calculation is from the x and y values used to draw lines above
# y = -think + x
# 2.2 = -20 + x
x = 22.2
segments( x, par("usr")[3], x, 2.2, col="violet", lwd=2)
text(26, par("usr")[3]+2, paste("M1*=", x))



Points

1. I was trying to set the visits( mean number of requests per job for a device ) to get an exact graph. Initially I did not know how to do that. Point 2 below seems to be what I missed initially.

2. The graph I got seemed to be wrong because the value of M1 and M1* described in the paper and shown above are wrongly calculated. M1* is the straight line drawn from the intersection of the two dashed red lines to intersect the Y-axis.

2. The new graph is a result of the new service demands

cpustime<-1.00
diskstime<-.88
drumstime<-.32

each of which is set to the product of the visit time and service time. M1 and M1* seem to be close to the problem graph in the paper.

3. The additional point is that I might have made an error in the PDQ or R code that I have failed to detect. I am not still an expert. The asymptote intersects the y-axis before ’20’ but the value of M1 is 22. That looks like an error which I will try to figure out.