MindSpace

Gradient Descent

October 6, 2015 Leave a comment

I ported the Gradient Descent code from Octave to Python. The base Octave code is the one from Andrew Ng’s Machine Learning MOOC.

I mistakenly believed that the Octave code for matrix multiplication will directly translate in Python.

The matrices are these.

But the Octave code is this

Octave code

  theta = theta - ( (  alpha * ( (( theta' * X' )' - y)' * X ))/length(y) )'

and the Python code is this.

Python

def gradientDescent( X,
                     y,
                     theta,
                     alpha = 0.01,
                     num_iters = 1500):

    r,c = X.shape
    
    for iter in range( 1, num_iters ):
        theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )
    return theta

This line is not a direct transalation.

        theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )

But only the above Python code gives me the correct theta that matches the value given by the Octave code.

Linear Regression

But the gradient descent also does not give me the correct value after a certain number of iterations. But the cost value is similar.

Gradient Descent from Octave Code that converges

Minimization of cost

Initial cost is 640.125590
J = 656.25
Initial cost is 656.250475
J = 672.58
Initial cost is 672.583001
J = 689.12
Initial cost is 689.123170
J = 705.87
Initial cost is 705.870980
J = 722.83
Initial cost is 722.826433
J = 739.99
Initial cost is 739.989527

Gradient Descent from my Python Code that does not converge to the optimal value

Minimization of cost

635.81837438
651.963633303
668.316534159
684.877076945
701.645261664
718.621088313
735.804556895

Filed under Machine Learning, Python

The AzureML Studio user interface is slick, very responsive and adopts a workflow supporting both R and Python scripts. There is a free account available with this caveat but that did not hamper my efforts to test some simple flows.

Note: Your free-tier Azure ML account allows you unlimited access, with some reduced capabilities compared to a full Microsoft Azure subscription. Your experiments will only run at low priority on a single processor core. As a result, you will experience some longer wait times. However, you have full access to all features of Azure ML.

The graph visualizations are very spiffy too. I am yet to finish the data cleansing aspects and use the really interesting ML algorithms.

Filed under Azure ML, Machine Learning

Principal Component Analysis

September 28, 2015 Leave a comment

This is about what I think I understood about Principal Component Analysis. I will update this blog post later.

The code is in github and it works but I think the eigen values could be wrong. I have to test it further.

These are the two main functions.


    """Compute the covariance matrix for a given dataset.
    """
def estimateCovariance( data ):
    print data
    mean = getmean( data )
    print mean
    dataZeroMean = map(lambda x : x - mean, data )
    print dataZeroMean
    covar = map( lambda x : np.outer(x,x) , dataZeroMean )
    print getmean( covar ) 
    return getmean( covar )

    """Computes the top `k` principal components, corresponding scores, and all eigenvalues.
    """
def pca(data, k=2):
    
    d = estimateCovariance(  data )
    
    eigVals, eigVecs = eigh(d)

    validate( eigVals, eigVecs )
    inds = np.argsort(eigVals)[::-1]
    topComponent = eigVecs[:,inds[:k]]
    print '\nTop Component: \n{0}'.format(topComponent)
    
    correlatedDataScores = map(lambda x : np.dot( x ,topComponent), data )
    print ('\nScores : \n{0}'
       .format('\n'.join(map(str, correlatedDataScores))))
    print '\n eigenvalues: \n{0}'.format(eigVals[inds])
    return topComponent,correlatedDataScores,eigVals[inds]

Filed under Machine Learning, PCA, Python

JPA and Spring @Transactional and JBoss Arquillian

September 20, 2015 Leave a comment

JBoss Arquillian is a test framework that one can use to execute tests in the IDE as part of the development process. The key parts are the deployment API and container adapters that enable us to deploy, tests that execute inside a container, automatically and repeatedly.

I have written about Arquillian here.
In this post I will show how a simple Arquillian test for a JPA transaction avoids countless wasted hours. Actually I spent a few hours trying to find out why enabling the wrong Transaction Manager produces log lines almost similar to the section below and misleads one into thinking that transactions are indeed in effect. It is the wrong transaction manager and no rows are actually committed to the Database. But the logs do show some messages that indicate data is committed.

This is the correct set of log messages that show that JpaTransactionManager takes effect.

DEBUG: org.springframework.transaction.annotation.AnnotationTransactionAttribute
Source – Adding transactional method ‘TestImpl.test’ with attribute: PROPAGATION
_REQUIRED,ISOLATION_DEFAULT; ”
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Creating new transact
ion with name [com.jpa.test.TestImpl.test]: PROPAGATION_REQUIRED,ISOLATION_DEFAU
LT; ”
DEBUG: org.hibernate.internal.SessionImpl – Opened session at timestamp: 1442144
9169
TRACE: org.hibernate.internal.SessionImpl – Setting flush mode to: AUTO
TRACE: org.hibernate.internal.SessionImpl – Setting cache mode to: NORMAL
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Opened new EntityMana
ger [org.hibernate.ejb.EntityManagerImpl@8f64d] for JPA transaction
DEBUG: org.hibernate.engine.transaction.spi.AbstractTransactionImpl – begin
DEBUG: org.hibernate.engine.jdbc.internal.LogicalConnectionImpl – Obtaining JDBC
connection
DEBUG: org.springframework.jdbc.datasource.SimpleDriverDataSource – Creating new
JDBC Driver Connection to [jdbc:hsqldb:mem:dataSource]
DEBUG: org.hibernate.engine.jdbc.internal.LogicalConnectionImpl – Obtained JDBC
connection
DEBUG: org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction – initial
autocommit status: true
DEBUG: org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction – disablin
g autocommit
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Exposing JPA transact
ion as JDBC transaction [org.springframework.orm.jpa.vendor.HibernateJpaDialect$
HibernateConnectionHandle@423d24]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Bound value [org.springframework.jdbc.datasource.ConnectionHolder@805780] for
key [org.springframework.jdbc.datasource.SimpleDriverDataSource@faa27c] to thre
ad [http-nio-8080-exec-5]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Bound value [org.springframework.orm.jpa.EntityManagerHolder@7a72fc] for key
[org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean@1e0cc0a] to
thread [http-nio-8080-exec-5]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Initializing transaction synchronization
TRACE: org.springframework.transaction.interceptor.TransactionInterceptor – Gett
ing transaction for [com.jpa.test.TestImpl.test]
INFO : jpa – TransactionSynchronizationManager.isActualTransactionActive()true
INFO : jpa – INMEMORY_DB [id=id, street=Street, area=Area, state=State, country
=LO, pin=1]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Retrieved value [org.springframework.orm.jpa.EntityManagerHolder@7a72fc] for
key [org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean@1e0cc0a]
bound to thread [http-nio-8080-exec-5]
TRACE: org.hibernate.engine.spi.IdentifierValue – ID unsaved-value strategy UNDE
FINED
TRACE: org.hibernate.event.internal.AbstractSaveEventListener – Transient instan
ce of: com.jpa.test.INMEMORY_DB
TRACE: org.hibernate.event.internal.DefaultPersistEventListener – Saving transie
nt instance
DEBUG: org.hibernate.event.internal.AbstractSaveEventListener – Generated identi
fier: id, using strategy: org.hibernate.id.Assigned
TRACE: org.hibernate.event.internal.AbstractSaveEventListener – Saving [com.jpa.
test.INMEMORY_DB#id]
TRACE: org.hibernate.engine.spi.ActionQueue – Adding an EntityInsertAction for [
com.jpa.test.INMEMORY_DB] object
TRACE: org.hibernate.engine.spi.ActionQueue – Adding insert with no non-nullable
, transient entities: [EntityInsertAction[com.jpa.test.INMEMORY_DB#id]]
TRACE: org.hibernate.engine.spi.ActionQueue – Adding resolved non-early insert a
ction.
TRACE: org.hibernate.action.internal.UnresolvedEntityInsertActions – No unresolv
ed entity inserts that depended on [[com.jpa.test.INMEMORY_DB#id]]
TRACE: org.hibernate.action.internal.UnresolvedEntityInsertActions – No entity i
nsert actions have non-nullable, transient entity dependencies.
TRACE: org.springframework.transaction.interceptor.TransactionInterceptor – Comp
leting transaction for [com.jpa.test.TestImpl.test]
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Initiating transactio
n commit
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Committing JPA transa
ction on EntityManager [org.hibernate.ejb.EntityManagerImpl@8f64d]
DEBUG: org.hibernate.engine.transaction.spi.AbstractTransactionImpl – committing

The source code has a copy of log4j.xml that enables the appropriate log. But this method is not repeatable in the sense that it is hard to manually check the log messages everytime we change the configuration or add new code. That is what unit tests are for and Arquillian container tests deploy our code into a container and execute tests in the IDE. The developer does not have to deploy manually and test the code. All that is required is a good regression test suite.

Arquillian uses the dependency arquillian-transaction-spring to make the test method transactional.

There are some dependencies in the pom.xml as well as in this Arquillian test that are not needed or redundant but the required ones are there.

package com.jpa.test;

import static org.junit.Assert.assertEquals;

import java.io.File;
import java.util.List;
import java.util.logging.Logger;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import javax.persistence.TypedQuery;
import javax.transaction.SystemException;

import org.jboss.arquillian.container.test.api.Deployment;
import org.jboss.arquillian.junit.Arquillian;
import org.jboss.arquillian.spring.integration.test.annotation.SpringConfiguration;
import org.jboss.arquillian.transaction.api.annotation.Transactional;
import org.jboss.shrinkwrap.api.ShrinkWrap;
import org.jboss.shrinkwrap.api.formatter.Formatters;
import org.jboss.shrinkwrap.api.spec.JavaArchive;
import org.jboss.shrinkwrap.api.spec.WebArchive;
import org.jboss.shrinkwrap.resolver.api.maven.Maven;
import org.junit.Test;
import org.junit.runner.RunWith;


@RunWith(Arquillian.class)
@SpringConfiguration("applicationContext.xml")
public class ShrinkWrappedJPATest {

	private static Logger l = Logger.getLogger("jpa");
		
	@PersistenceContext(unitName="testingSetup")
	private EntityManager entityManager;

    @Deployment
    public static WebArchive createWebArchive() {
  
    	final WebArchive war=ShrinkWrap.create(WebArchive.class,"ShrinkWrapJPA.war");
    	  
        JavaArchive jar = ShrinkWrap.create(JavaArchive.class)
                				.addPackage("com.jpa.test");
 

        war.addAsLibrary(jar);
    	war.addAsResource("applicationContext.xml");
    	war.addAsResource("arquillian.xml");
    	war.addAsResource("log4j.xml");
    	war.addAsResource("schema.sql");
    	war.addAsResource("test-data.sql");
    	war.addAsResource("log4j.xml");
    	war.addAsResource("persistence.xml", "META-INF/persistence.xml");
    	loadDependencies( war );
 
    	l.info(war.toString(Formatters.VERBOSE));
    	return war;
    }

        
    
    
    private static void loadDependencies( final WebArchive war ){
    	
        File springorm = Maven.
				resolver().
					resolve("org.springframework:spring-orm:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springorm);

        File hibernate = Maven.
 				resolver().
 					resolve("org.hibernate:hibernate-core:4.1.7.FINAL")
 						.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(hibernate);

 
	    File hibernate1 = Maven.
					resolver().
						resolve("org.hibernate:hibernate-entitymanager:4.1.7.FINAL")
							.withoutTransitivity().asSingle(File.class);
	
	     war.addAsLibraries(hibernate1);

        File springexpression = Maven.
				resolver().
					resolve("org.springframework:spring-expression:4.2.0.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springexpression);

        File springweb = Maven.
				resolver().
					resolve("org.springframework:spring-web:4.2.0.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springweb);

        File springcore = Maven.
				resolver().
					resolve("org.springframework:spring-core:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springcore);
        
        File springcontext = Maven.
				resolver().
					resolve("org.springframework:spring-context:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springcontext);

        File springjdbc = Maven.
				resolver().
					resolve("org.springframework:spring-jdbc:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springjdbc);
        File springtx = Maven.
				resolver().
					resolve("org.springframework:spring-tx:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springtx);
        File hsqldb = Maven.
 				resolver().
 					resolve("org.hsqldb:hsqldb:2.3.1")
 						.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(hsqldb);
         File dbcp = Maven.
 				resolver().
 					resolve("commons-dbcp:commons-dbcp:1.4")
 						.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(dbcp);
       File aopalliance = Maven.
				resolver().
					resolve("aopalliance:aopalliance:1.0")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(aopalliance);
        File extensionspring = Maven.
				resolver().
					resolve("org.jboss.arquillian.extension:arquillian-service-deployer-spring-3:1.0.0.Beta3")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(extensionspring);


         File springbeans = Maven.
				resolver().
					resolve("org.springframework:spring-beans:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springbeans);

        File springaop = Maven.
				resolver().
					resolve("org.springframework:spring-aop:4.2.0.RELEASE")
					.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springaop);


         File transactionapi = Maven.
 				resolver().
 					resolve("org.jboss.arquillian.extension:arquillian-transaction-api:1.0.1.Final")
 					.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(transactionapi);
         File transactionimplbase = Maven.
 				resolver().
 					resolve("org.jboss.arquillian.extension:arquillian-transaction-impl-base:1.0.1.Final")
 					.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(transactionimplbase);

    }
    
    @Test
    @Transactional(manager="transactionManager")
    
	public void save() throws Exception, SystemException {
 

        INMEMORY_DB a = new INMEMORY_DB();
		a.setId("id");
		a.setStreet("Street");
		a.setArea("Area");
		a.setState("State");
		a.setCountry("LO");
		a.setPin(1);
		entityManager.persist( a );
  		assertEquals(getAddressCount(), 2);
	}

 
	public int getAddressCount(){
		TypedQuery<INMEMORY_DB> query =
				entityManager.createQuery("SELECT c FROM INMEMORY_DB c", INMEMORY_DB.class);
		List<INMEMORY_DB> results = query.getResultList();	
		return results.size();
	}

}

Filed under Arquillian, Java, JPA

Aesthetics of Matplotlib graphs

September 15, 2015 1 Comment

The graph shown in my earlier postis not clear and it looks wrong. I have improved it to some extent using this code. Matplotlib has many features more powerful than what I used earlier. I have commented the code used to annotate and display the actual points in the graph. I couldn’t properly draw the tick marks so that the red graph is clearly shown because the data range wasn’t easy to work with. There should be some feature that I still have not explored.


import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np

def main():
    gclog = pd.DataFrame(columns=['SecondsSinceLaunch',
                                   'BeforeSize',
                                   'AfterSize',
                                   'TotalSize',
                                   'RealTime'])
    with open("D:\\performance\\data.txt", "r") as f:
        for line in f:
            strippeddata = line.split()
            gclog = gclog.append(pd.DataFrame( [dict(SecondsSinceLaunch=strippeddata[0],
                                                     BeforeSize=strippeddata[1],
                                                     AfterSize=strippeddata[2],
                                                     TotalSize=strippeddata[3],
                                                     RealTime=strippeddata[4])] ),
                                               ignore_index=True)
    print gclog
    #gclog.time = pd.to_datetime(gclog['SecondsSinceLaunch'], format='%Y-%m-%dT%H:%M:%S.%f')
    gclog = gclog.convert_objects(convert_numeric=True)
    fig, ax = plt.subplots(figsize=(17, 14), facecolor='white', edgecolor='white')
    ax.axes.tick_params(labelcolor='darkblue', labelsize='10')
    for axis, ticks in [(ax.get_xaxis(), np.arange(10, 8470, 100) ), (ax.get_yaxis(), np.arange(10, 9125, 300))]:
        axis.set_ticks_position('none')
        axis.set_ticks(ticks)
        axis.label.set_color('#999999')
        if False: axis.set_ticklabels([])
    plt.grid(color='#999999', linewidth=1.0, linestyle='-')
    plt.xticks(rotation=70)
    plt.gcf().subplots_adjust(bottom=0.15)
    map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])
    ax.set_xlabel(r'AfterSize'), ax.set_ylabel(r'TotalSize')
    ax.set_xlim(10, 8470, 100), ax.set_ylim(10, 9125, 300)    
    plt.plot(sorted(gclog.AfterSize),gclog.TotalSize,c="red")
#     for i,j in zip(sorted(gclog.AfterSize),gclog.TotalSize):
#         ax.annotate('(' + str(i) + ',' + str(j) + ')',xy=(i, j))
    
    plt.show()
if __name__=="__main__":
    main()

In fact after finishing the Principal Component Analysis section of a Machine Learning course I took recently I realized beautiful 3D graphs can be drawn.

Filed under Matplotlib, Python

Arquillian Unit Tests

September 6, 2015 Leave a comment

arquillian_logo I have contributed an article to DZone about Arquillian using this source code.

Filed under Arquillian, Java

Aesthetics of ‘R’ Plots

August 31, 2015 1 Comment

I submitted a fresh pull request to adoptopenjdk to rectify the graphs in parsing-java-micro-benchmarking-harness-data-using-dplyr-part-2 after the lead developer of gs-collections identified that the titles in the two graphs are switched.

It is very important that we do not mislead with data.

g <- ggplot(jc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "lightseagreen",
				linetype = 2, alpha= 0.1) +
		geom_line(color = "lightblue", size = 0.6) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.1,
				color = "darkorange3") +
		labs(x = "Iterations", y = "Goldmansachs collections") + geom_text(aes(label = V1), size=4, family="Times", lineheight=.8)

Filed under R

Task duration estimation using betaPERT distribution and Monte Carlo Analysis

July 8, 2015 Leave a comment

I have been researching this seemingly simple topic for many months. After reading many articles and browsing books and asking other experts I have a basic idea about this. Now I also find it ludicrous that our famed project managers in my companies do not seem to know this simple distribution after appearing for the PMP exams. Many managers here care not a coot for such technical items.

According to betaPERT

“The Beta-PERT methodology allows to parametrize a generalized Beta distribution based on expert opinion regarding a pessimistic estimate (minimum value), a most likely estimate (mode), and an optimistic estimate (maximum value).”

I will add more explanations later based on what I understand.


library(ggplot2)

opt <- 50 # Optimistic estimate
lik <- 100 # Likely estimate
pes <- 500 # Pessimistic estimate

lambda <- 4 # PERT weighting for likely

# PERT estimate is then
print( (opt + lambda*lik + pes)/(2 + lambda) )

# Mapping to the beta distribution
s1 <- 1+lambda*(lik-opt)/(pes-opt)
s2 <- 1+lambda*(pes-lik)/(pes-opt)

# Generate 1000 samples from the beta distribution that is scaled to the PERT parameters

persondays <- opt + (pes-opt) * rbeta(1000, s1, s2)

# Look at the output
png("D:/R/persondayssimulation.png")
hist(persondays)
dev.off()

print( summary(persondays))
# Compare to the PERT estimate
print( mean(persondays) )

gg <- ggplot(data.frame(persondays),
		     aes(x = persondays))

gg <- gg + geom_histogram(aes(y = ..density..),
		                  color = "black",
						  fill = "white", 
		                  binwidth = 15)
				  
gg <- gg + geom_density(fill = "mediumvioletred",
		                alpha = 0.5)
gg <- gg + theme_bw() 

ggsave("D:\\R\\density.png",
		width=6,
		height=6,
		dpi=100)

The book I was asked to read to understand Monte Carlo analysis is Introducing Monte Carlo Methods with R

Credit should go to

for sending me a private mail about this.

Filed under betaPERT, Monte Carlo Analysis, R

Parsing Java Micro-benchmarking Harness data using dplyr – Part 2

June 18, 2015 Leave a comment

I have to add explanations later because I have to determine if the statistical measures calculated are correct or wrong. But this is based on the previous blog post.

Update : I think the measures are correctly plotted.

Types of Error bars used to plot the diagram

Error bars	Type	Description
Standard error (SEM)	Inferential	A measure of how variable the mean will be, if you repeat the whole study many times.
Confidence interval (CI), usually 95% CI	Inferential	A range of values you can be 95% confident contains the true mean.

The parsing will not work if JMH changes the default format of the output file.

library(stringr)
library(dplyr)
library(ggplot2)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
	    select(V1) %>%	
		filter(grepl("^Iteration", V1)) %>%  
        mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

final <- mutate(final,IDX = 1:n())

jc <- final %>%
		filter(IDX < 21)


gc <- final %>%
		filter(IDX > 20)

gc <- mutate(gc,IDX = 1:n())

jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x))))
gc <- data.frame(sapply(gc, function(x) as.numeric(as.character(x))))


print(summary(jc$V1))
error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1))
error1 <- mean(jc$V1)-error
error2 <- mean(jc$V1)+error

q <- qplot(geom = "line",jc$IDX,jc$V1, colour='red')+geom_errorbar(aes(x=jc$IDX, ymin=jc$V1-sd(jc$V1), ymax=jc$V1+sd(jc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=jc$IDX, y=jc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Java Collections")+theme_bw() 

ggsave("D:\\jmh\\jc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1)) 
g <- ggplot(jc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Java collections")

ggsave("D:\\jmh\\ggplotjc.png", width=6, height=6, dpi=100)


print(summary(gc$V1))
error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1))
error1 <- mean(gc$V1)-error
error2 <- mean(gc$V1)+error

q1 <- qplot(geom = "line",gc$IDX,gc$V1, colour='red')+geom_errorbar(aes(x=gc$IDX, ymin=gc$V1-sd(gc$V1), ymax=gc$V1+sd(gc$V1)), width=0.25)+ 
		geom_ribbon(aes(x=gc$IDX, y=gc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+ 
		xlab('Iterations') + ylab("Goldmansachs Collections")+theme_bw() 


ggsave("D:\\jmh\\gc.png", width=6, height=6, dpi=100)

#Using error <- qt(0.995,df=length(gc$V1)-1)*sd(gc$V1)/sqrt(length(gc$V1)) 
g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggsave("D:\\jmh\\ggplotgc.png", width=6, height=6, dpi=100)

Suggested by the R user forum to improve the aesthetics of the plot. The Confidence Interval of 99% shown in the plots above is not correct. But the curves and error bars are correct.

g1 <- ggplot(gc, aes(x = IDX, y = V1)) +
		theme_bw() +
		geom_ribbon(aes(ymin = V1 - error, ymax = V1 + error), fill = "gray60",
				alpha = 0.3) +
		geom_line(color = "blue", size = 1) +
		geom_errorbar(aes(ymin = V1 - error, ymax = V1 + error), width = 0.25,
				color = "red") +
		labs(x = "Iterations", y = "Goldmansachs collections")

ggplot creates these two graphs. So instead of qplot code we should use ggplot.

Update : See this

Filed under Java, R

Parsing Java Micro-benchmarking Harness data using dplyr – Part 1

June 13, 2015 Leave a comment

This is about the venerable JMH and Hadley Wickham’s dplyr and pipes package. dplyr enables you to have too much fun with data. Its pipes are so powerful and makes short shrift of even messy data.

# VM invoker: D:\Java\bin\java.exe
# VM options: -XX:-TieredCompilation -Dbenchmark.n=10000
# Warmup: 5 iterations, 50 ms each
# Measurement: 20 iterations, 50 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: oracle.stream.javaone.CollectionComparison.goldmansachscollections

# Run progress: 0.00% complete, ETA 00:00:02
# Fork: 1 of 1
# Warmup Iteration 1: 0.443 us/op
# Warmup Iteration 2: 0.290 us/op
# Warmup Iteration 3: 0.343 us/op
# Warmup Iteration 4: 0.350 us/op
# Warmup Iteration 5: 0.388 us/op
Iteration 1: 0.796 us/op
Iteration 2: 0.542 us/op
Iteration 3: 0.510 us/op
Iteration 4: 0.617 us/op
Iteration 5: 0.482 us/op
Iteration 6: 0.387 us/op
Iteration 7: 0.272 us/op
Iteration 8: 0.536 us/op
Iteration 9: 0.498 us/op
Iteration 10: 0.402 us/op
Iteration 11: 0.328 us/op
Iteration 12: 0.542 us/op
Iteration 13: 0.299 us/op
Iteration 14: 0.647 us/op
Iteration 15: 0.291 us/op
Iteration 16: 0.815 us/op
Iteration 17: 0.680 us/op
Iteration 18: 0.363 us/op
Iteration 19: 0.560 us/op
Iteration 20: 0.334 us/op

Result: 0.495 ¦(99.9%) 0.140 us/op [Average]
Statistics: (min, avg, max) = (0.272, 0.495, 0.815), stdev = 0.162
Confidence interval (99.9%): [0.355, 0.636]

# VM invoker: D:\Java\bin\java.exe
# VM options: -XX:-TieredCompilation -Dbenchmark.n=10000
# Warmup: 5 iterations, 50 ms each
# Measurement: 20 iterations, 50 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: oracle.stream.javaone.CollectionComparison.javacollections

# Run progress: 50.00% complete, ETA 00:00:05
# Fork: 1 of 1
# Warmup Iteration 1: 0.475 us/op
# Warmup Iteration 2: 0.696 us/op
# Warmup Iteration 3: 0.816 us/op
# Warmup Iteration 4: 0.622 us/op
# Warmup Iteration 5: 0.574 us/op
Iteration 1: 0.987 us/op
Iteration 2: 0.585 us/op
Iteration 3: 0.770 us/op
Iteration 4: 0.711 us/op
Iteration 5: 0.546 us/op
Iteration 6: 0.553 us/op
Iteration 7: 1.164 us/op
Iteration 8: 1.096 us/op
Iteration 9: 1.477 us/op
Iteration 10: 0.824 us/op
Iteration 11: 1.002 us/op
Iteration 12: 0.504 us/op
Iteration 13: 1.019 us/op
Iteration 14: 0.834 us/op
Iteration 15: 0.589 us/op
Iteration 16: 0.557 us/op
Iteration 17: 1.338 us/op
Iteration 18: 0.906 us/op
Iteration 19: 0.486 us/op
Iteration 20: 0.587 us/op

Result: 0.827 ¦(99.9%) 0.252 us/op [Average]
Statistics: (min, avg, max) = (0.486, 0.827, 1.477), stdev = 0.291
Confidence interval (99.9%): [0.574, 1.079]

# Run complete. Total time: 00:00:10

Benchmark Mode Samples Score Scor
e error Units
o.s.j.CollectionComparison.goldmansachscollections avgt 20 0.495
0.140 us/op
o.s.j.CollectionComparison.javacollections avgt 20 0.827
0.252 us/op

library(stringr)
library(dplyr)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
	    select(V1) %>%	
		filter(grepl("^Iteration", V1)) %>%  
        mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

print(final)

V1
1 0.796
2 0.542
3 0.510
4 0.617
5 0.482
6 0.387
7 0.272
8 0.536
9 0.498
10 0.402
11 0.328
12 0.542
13 0.299
14 0.647
15 0.291
16 0.815
17 0.680
18 0.363
19 0.560
20 0.334
21 0.987
22 0.585
23 0.770
24 0.711
25 0.546
26 0.553
27 1.164
28 1.096
29 1.477
30 0.824
31 1.002
32 0.504
33 1.019
34 0.834
35 0.589
36 0.557
37 1.338
38 0.906
39 0.486
40 0.587

Filed under Java, JMH

← Older posts

Newer posts →

MindSpace

Gradient Descent

Octave code

Python

Linear Regression

Gradient Descent from Octave Code that converges

Minimization of cost

Gradient Descent from my Python Code that does not converge to the optimal value

Minimization of cost

Azure Machine Learning

Principal Component Analysis

JPA and Spring @Transactional and JBoss Arquillian

Aesthetics of Matplotlib graphs

Arquillian Unit Tests

Aesthetics of ‘R’ Plots

Task duration estimation using betaPERT distribution and Monte Carlo Analysis

Parsing Java Micro-benchmarking Harness data using dplyr – Part 2

Types of Error bars used to plot the diagram

Suggested by the R user forum to improve the aesthetics of the plot. The Confidence Interval of 99% shown in the plots above is not correct. But the curves and error bars are correct.

Parsing Java Micro-benchmarking Harness data using dplyr – Part 1

Blogroll