Build OpenJDK 9

The build was delayed by this error.

configure: error: Could not find freetype!
configure exiting with result code 1

It is already installed !!

localhost:jdk9 radhakrishnan$ brew install freetype

Error: freetype-2.5.3_1 already installed

To install this version, first `brew unlink freetype’

This helped.

Mohans-MacBook-Pro:jdk9 radhakrishnan$ bash configure –with-freetype-include=/usr/X11/include/freetype2 –with-freetype-lib=/usr/X11/lib

A new configuration has been successfully created in

/Users/radhakrishnan/OpenJDK/jdk9/build/macosx-x86_64-normal-server-release

using configure arguments ‘–with-freetype-include=/usr/X11/include/freetype2 –with-freetype-lib=/usr/X11/lib’.

Configuration summary:

* Debug level: release

* HS debug level: product

* JDK variant: normal

* JVM variants: server

* OpenJDK target: OS: macosx, CPU architecture: x86, address length: 64

Tools summary:

* Boot JDK: java version “1.8.0_45” Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) (at /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home)

* Toolchain: clang (clang/LLVM)

* C Compiler: Version Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn) Target: x86_64-apple-darwin13.4.0 Thread model: posix (at /usr/bin/clang)

* C++ Compiler: Version Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn) Target: x86_64-apple-darwin13.4.0 Thread model: posix (at /usr/bin/clang++)

—– Build times ——-
Start 2015-04-26 08:23:02
End 2015-04-26 08:45:01
00:05:54 verify-modules
00:21:59 TOTAL
————————-
/bin/bash /Users/radhakrishnan/OpenJDK/jdk9/common/bin/logger.sh /Users/radhakrishnan/OpenJDK/jdk9/build/macosx-x86_64-normal-server-release/build.log /usr/bin/printf “Finished building targets ‘clean images’ in configuration ‘macosx-x86_64-normal-server-release’\n”
Finished building targets ‘clean images’ in configuration ‘macosx-x86_64-normal-server-release’

Scala Build Tool

Screen Shot 2015-04-21 at 10.18.24 am

I started off on the wrong foot in my newfound zeal to learn Scala. This is a simple installation that went horribly wrong. Shouldn’t this be an effortless setup ?

localhost:project radhakrishnan$ sbt

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0

error: error while loading CharSequence, class file ‘/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/jre/lib/
rt.jar(java/lang/CharSequence.class)’ is broken

(bad constant pool tag 18 at byte 10)

[error] Type error in expression

Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? i

[warn] Ignoring load failure: no project loaded.

I upgraded my JDK to 1.8.0_45

After that I see this.

sbt -v from the command line throws this

java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected

at jline.TerminalFactory.create(TerminalFactory.java:101)

at jline.TerminalFactory.get(TerminalFactory.java:159)

at sbt.ConsoleLogger$.ansiSupported(ConsoleLogger.scala:85)

at sbt.ConsoleLogger$.(ConsoleLogger.scala:79)

at sbt.ConsoleLogger$.(ConsoleLogger.scala)

at sbt.GlobalLogging$.initial(GlobalLogging.scala:43)

at sbt.StandardMain$.initialGlobalLogging(Main.scala:60)

at sbt.StandardMain$.initialState(Main.scala:69)

at sbt.xMain.run(Main.scala:28)

at xsbt.boot.Launch$.run(Launch.scala:55)

at xsbt.boot.Launch$$anonfun$explicit$1.apply(Launch.scala:45)

at xsbt.boot.Launch$.launch(Launch.scala:69)

at xsbt.boot.Launch$.apply(Launch.scala:16)

at xsbt.boot.Boot$.runImpl(Boot.scala:31)

at xsbt.boot.Boot$.main(Boot.scala:20)

at xsbt.boot.Boot.main(Boot.scala)

So I downloaded the SBT launcher JAR and replaced the old one with it. I don’t understand any of this but it is working now.

I also found that years of programming experience is no match for the newfangled Scala.

‘R’ code to parse the G1 gc log

I found this blog uses ‘R’ to parse the G1 gc logs. I am specifically using ‘Format 2’ log lines mentioned in the blog. This is one of my favorite techniques to learn ‘R’. The author uses Unix shell scripts to parse the data but I think ‘R’ to an excellent language to parse data. I have done this in the past.

Eventhough my code here is verbose a ‘R’ expert can make short work of the data easily. My code uses regular expressions too many times.

library(stringr)

g1 <- read.table("D:\\Python Data Analytics\\single_g1gc_trace.txt",sep="\t")

timestamps <- g1[ with(g1,  grepl("[0-9]+-[0-9]+-", V1)) , ]
timestamp <- str_extract(timestamps,'[^:]*:[^:]*')
heapsizes <- g1[ with(g1,  grepl("[0-9]+M+",  V1)) , ]

g1data <- data.frame( timestamp )
g1data$heapsizes <- heapsizes

heapsize <- str_extract_all(g1data$heapsizes,'[0-9]+')

g1data$BeforeSize <- 0
g1data$AfterSize <- 0
g1data$TotalSize <- 0
g1data$timestamp <- 0

row = NROW(g1data)
i = 1

	for( size in heapsize ){
	  if( i <= row ){
      	g1data[i,3] <- size[1]
	  	g1data[i,4] <- size[2]
	  	g1data[i,5] <- size[3]
	  	i = i + 1
	   }
	}
	i = 1
	for( stamp in timestamp ){
		if( i <= row ){
			g1data[i,1] <- stamp
			i = i + 1
		}
	}
	
g1data <- subset(g1data, select=c(timestamp,BeforeSize,AfterSize,TotalSize))

timestamp BeforeSize AfterSize TotalSize
1 2014-05-13T08:54 1938 904 9216
2 2014-05-13T08:54 1939 905 9217

Parsing HTML using BeautifulSoup

This Python code that parses HTML seems to truncate the tags when I print it. I am attempting to check for the presence of the ID attribute in the tags. The code just iterates over all tags and it does not specifically look for a HTML control. It just matches the opening and closing tag arbitratrily. I am still working on it and will update it.

D:\Python Data Analytics\view.html
   No                                                Tag
0   1  <div class="panel-collapse collapse" id="activ...
1   1  <select class="selectpicker " id="condition1...
2   1  <select class="selectpicker " id="condition2...
3   1  <select class="selectpicker " id="condition3...
4   1  <select class="selectpicker " id="condition4...
5   1  <select class="selectpicker " id="condition5...
6   1  <input class="btn xbtn-primary save" id="ApS...
7   1  <input class="btn btn-link" id="Cancel" name...
from bs4 import BeautifulSoup as bs
import sys
import os
import pandas as pd
import fnmatch

class Parse:
 
    def __init__(self):
        self.parse()


    def parse(self):
        
        pd.options.display.max_colwidth = 0
        try:
            path = "D:\\Python Data Analytics\\"
            
            f = open('D:\python\\report.html','w')

 
             #Pattern to be matched
            includes = ['*.html']
        
            for root, subFolders, files in os.walk(path):
                 for extensions in includes:
                     
                    for infile in fnmatch.filter(files, extensions):
                            soup = bs(open( path + infile, "r").read())
                                        
                            data = soup.findAll(True,{'id':True})
                            
                            df = pd.DataFrame(columns=[
                                                       'ID',
                                                       'Tag'])

                            idattributes = []
                            duplicates = [] 
                            
                            for attribute in data:
                                idTag = attribute.find('id')
                                att = attribute.attrs
                                idattributes.append(att['id'])
                                df = df.append(pd.DataFrame( [dict(
                                                                   ID=att['id'],
                                                                   Tag=attribute)] ),
                                                                   ignore_index=True)
                            s = set()
                            duplicates = set(x for x in idattributes if x in s or s.add(x))  
                                                              
                            data1 = soup.findAll(attrs={'id': None})
                            df1 = pd.DataFrame(columns=[
                                                       
                                                       'Tag'])
            
                            missingid = {} 
                            count = 0
                            for attribute in data1:
                                    missingid.update({count: attribute})
                                    df1 = df1.append(pd.DataFrame( [dict(
                                                                   Tag=attribute)] ),
                                                                   ignore_index=True)
                                    count = count + 1
                                    
                            df2 = pd.DataFrame(missingid.items())
                            html5report = df
                            print df2
                            html5report1 = df2
                            
                            table = ""
                            table += '<table>'
                            for element in duplicates:
                                table += '  <tr>'
                                table += '    <td>' + element + '</td>'
                                table += '  </tr>'
                            table += '</table>'
                            
                            html5report1 = html5report1.to_html().replace('<table border="1" class="dataframe">','<table class="table table-striped">')
                            html5report = html5report.to_html().replace('<table border="1" class="dataframe">','<table class="table table-striped">')
                            htmlreporter = '''
            							<html>
                							<head>
                    						<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
                    						<style>body{ margin:0 100; background:whitesmoke; }</style>
                							</head>
                                            <body>
                                            <h1>HTML 5 Report</h1>
                                            <h2>''' + infile + '''</h2>
            
                                            <h3>Tags with ID present</h3>
                   								''' + html5report + '''
                                            <h3>Tags with ID not present</h3>
                                                ''' + html5report1 + '''
                                            <h3>Possible Duplicates</h3>
                                                ''' + table + '''
                							</body>
            				</html>'''
                            f.write(htmlreporter)
            f.close()    
                            
        except IOError as e:
            print "I/O error({0}): {1}".format(e.errno, e.strerror)
        except Exception, err:
            print "Unexpected error:", sys.exc_info()[0]
 

if __name__ == '__main__':
    instance = Parse()

Pandas and matplotlib

matplotlibI have used R Data Frames and they were very versatile and compared to that the pandas Data Frames seem slightly harder to get right. But I am after the excellent support for Machine Learning and data analytics that scikit provides.

This graph is simple and I usually parse Java GC logs to practise. I plan to parse the Java G1 GC log to get my hands dirty by using pandas Data Frames.

  AfterSize BeforeSize RealTime       SecondsSinceLaunch TotalSize
0        20      3.109     9216  2014-05-13T13:24:35.091      5029
1      9125      3.459     9216  2014-05-13T13:24:35.440      6077
2        25      5.599     9216  2014-05-13T13:24:37.581      8470
3        44     10.704     9216  2014-05-13T13:24:42.686        15
4        51     16.958     9216  2014-05-13T13:24:48.941        20
5        92     24.066     9216  2014-05-13T13:24:56.049        26
6       602     62.383     9216  2014-05-13T13:25:34.368        68
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

def main():
    gclog = pd.DataFrame(columns=['SecondsSinceLaunch',
                                   'BeforeSize',
                                   'AfterSize',
                                   'TotalSize',
                                   'RealTime'])
    with open("D:\\performance\\data.txt", "r") as f:
        for line in f:
            strippeddata = line.split()
            gclog = gclog.append(pd.DataFrame( [dict(SecondsSinceLaunch=strippeddata[0],
                                                     BeforeSize=strippeddata[1],
                                                     AfterSize=strippeddata[2],
                                                     TotalSize=strippeddata[3],
                                                     RealTime=strippeddata[4])] ),
                                               ignore_index=True)
    print gclog
    #gclog.time = pd.to_datetime(gclog['SecondsSinceLaunch'], format='%Y-%m-%dT%H:%M:%S.%f')
    gclog = gclog.convert_objects(convert_numeric=True)
    plt.plot(gclog.TotalSize, gclog.AfterSize)
    plt.show()
if __name__=="__main__":
    main()

matplotlib

Update :

The graph shown above is not clear and it looks wrong. I have improved it to some extent using this code. Matplotlib has many features more powerful than what I used earlier. I have commented the code used to annotate and display the actual points in the graph. I couldn’t properly draw the tick marks so that the red graph is clearly shown because the data range wasn’t easy to work with. There should be some feature that I still have not explored.


import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np

def main():
    gclog = pd.DataFrame(columns=['SecondsSinceLaunch',
                                   'BeforeSize',
                                   'AfterSize',
                                   'TotalSize',
                                   'RealTime'])
    with open("D:\\performance\\data.txt", "r") as f:
        for line in f:
            strippeddata = line.split()
            gclog = gclog.append(pd.DataFrame( [dict(SecondsSinceLaunch=strippeddata[0],
                                                     BeforeSize=strippeddata[1],
                                                     AfterSize=strippeddata[2],
                                                     TotalSize=strippeddata[3],
                                                     RealTime=strippeddata[4])] ),
                                               ignore_index=True)
    print gclog
    #gclog.time = pd.to_datetime(gclog['SecondsSinceLaunch'], format='%Y-%m-%dT%H:%M:%S.%f')
    gclog = gclog.convert_objects(convert_numeric=True)
    fig, ax = plt.subplots(figsize=(17, 14), facecolor='white', edgecolor='white')
    ax.axes.tick_params(labelcolor='darkblue', labelsize='10')
    for axis, ticks in [(ax.get_xaxis(), np.arange(10, 8470, 100) ), (ax.get_yaxis(), np.arange(10, 9125, 300))]:
        axis.set_ticks_position('none')
        axis.set_ticks(ticks)
        axis.label.set_color('#999999')
        if False: axis.set_ticklabels([])
    plt.grid(color='#999999', linewidth=1.0, linestyle='-')
    plt.xticks(rotation=70)
    plt.gcf().subplots_adjust(bottom=0.15)
    map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])
    ax.set_xlabel(r'AfterSize'), ax.set_ylabel(r'TotalSize')
    ax.set_xlim(10, 8470, 100), ax.set_ylim(10, 9125, 300)    
    plt.plot(sorted(gclog.AfterSize),gclog.TotalSize,c="red")
#     for i,j in zip(sorted(gclog.AfterSize),gclog.TotalSize):
#         ax.annotate('(' + str(i) + ',' + str(j) + ')',xy=(i, j))
    
    plt.show()
if __name__=="__main__":
    main()

figure_1