Principal Component Analysis

This is about what I think I understood about Principal Component Analysis. I will update this blog post later.

The code is in github and it works but I think the eigen values could be wrong. I have to test it further.

These are the two main functions.

```
"""Compute the covariance matrix for a given dataset.
"""
def estimateCovariance( data ):
print data
mean = getmean( data )
print mean
dataZeroMean = map(lambda x : x - mean, data )
print dataZeroMean
covar = map( lambda x : np.outer(x,x) , dataZeroMean )
print getmean( covar )
return getmean( covar )

"""Computes the top `k` principal components, corresponding scores, and all eigenvalues.
"""
def pca(data, k=2):

d = estimateCovariance(  data )

eigVals, eigVecs = eigh(d)

validate( eigVals, eigVecs )
inds = np.argsort(eigVals)[::-1]
topComponent = eigVecs[:,inds[:k]]
print '\nTop Component: \n{0}'.format(topComponent)

correlatedDataScores = map(lambda x : np.dot( x ,topComponent), data )
print ('\nScores : \n{0}'
.format('\n'.join(map(str, correlatedDataScores))))
print '\n eigenvalues: \n{0}'.format(eigVals[inds])

```

JPA and Spring @Transactional and JBoss Arquillian

JBoss Arquillian is a test framework that one can use to execute tests in the IDE as part of the development process. The key parts are the deployment API and container adapters that enable us to deploy, tests that execute inside a container, automatically and repeatedly.

I have written about Arquillian here.
In this post I will show how a simple Arquillian test for a JPA transaction avoids countless wasted hours. Actually I spent a few hours trying to find out why enabling the wrong Transaction Manager produces log lines almost similar to the section below and misleads one into thinking that transactions are indeed in effect. It is the wrong transaction manager and no rows are actually committed to the Database. But the logs do show some messages that indicate data is committed.

This is the correct set of log messages that show that JpaTransactionManager takes effect.

DEBUG: org.springframework.transaction.annotation.AnnotationTransactionAttribute
Source – Adding transactional method ‘TestImpl.test’ with attribute: PROPAGATION
_REQUIRED,ISOLATION_DEFAULT; ”
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Creating new transact
ion with name [com.jpa.test.TestImpl.test]: PROPAGATION_REQUIRED,ISOLATION_DEFAU
LT; ”
DEBUG: org.hibernate.internal.SessionImpl – Opened session at timestamp: 1442144
9169
TRACE: org.hibernate.internal.SessionImpl – Setting flush mode to: AUTO
TRACE: org.hibernate.internal.SessionImpl – Setting cache mode to: NORMAL
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Opened new EntityMana
ger [org.hibernate.ejb.EntityManagerImpl@8f64d] for JPA transaction
DEBUG: org.hibernate.engine.transaction.spi.AbstractTransactionImpl – begin
DEBUG: org.hibernate.engine.jdbc.internal.LogicalConnectionImpl – Obtaining JDBC
connection
DEBUG: org.springframework.jdbc.datasource.SimpleDriverDataSource – Creating new
JDBC Driver Connection to [jdbc:hsqldb:mem:dataSource]
DEBUG: org.hibernate.engine.jdbc.internal.LogicalConnectionImpl – Obtained JDBC
connection
DEBUG: org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction – initial
autocommit status: true
DEBUG: org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction – disablin
g autocommit
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Exposing JPA transact
ion as JDBC transaction [org.springframework.orm.jpa.vendor.HibernateJpaDialect\$
HibernateConnectionHandle@423d24]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Bound value [org.springframework.jdbc.datasource.ConnectionHolder@805780] for
key [org.springframework.jdbc.datasource.SimpleDriverDataSource@faa27c] to thre
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Bound value [org.springframework.orm.jpa.EntityManagerHolder@7a72fc] for key
[org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean@1e0cc0a] to
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Initializing transaction synchronization
TRACE: org.springframework.transaction.interceptor.TransactionInterceptor – Gett
ing transaction for [com.jpa.test.TestImpl.test]
INFO : jpa – TransactionSynchronizationManager.isActualTransactionActive()true
INFO : jpa – INMEMORY_DB [id=id, street=Street, area=Area, state=State, country
=LO, pin=1]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Retrieved value [org.springframework.orm.jpa.EntityManagerHolder@7a72fc] for
key [org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean@1e0cc0a]
bound to thread [http-nio-8080-exec-5]
TRACE: org.hibernate.engine.spi.IdentifierValue – ID unsaved-value strategy UNDE
FINED
TRACE: org.hibernate.event.internal.AbstractSaveEventListener – Transient instan
ce of: com.jpa.test.INMEMORY_DB
TRACE: org.hibernate.event.internal.DefaultPersistEventListener – Saving transie
nt instance
DEBUG: org.hibernate.event.internal.AbstractSaveEventListener – Generated identi
fier: id, using strategy: org.hibernate.id.Assigned
TRACE: org.hibernate.event.internal.AbstractSaveEventListener – Saving [com.jpa.
test.INMEMORY_DB#id]
TRACE: org.hibernate.engine.spi.ActionQueue – Adding an EntityInsertAction for [
com.jpa.test.INMEMORY_DB] object
TRACE: org.hibernate.engine.spi.ActionQueue – Adding insert with no non-nullable
, transient entities: [EntityInsertAction[com.jpa.test.INMEMORY_DB#id]]
TRACE: org.hibernate.engine.spi.ActionQueue – Adding resolved non-early insert a
ction.
TRACE: org.hibernate.action.internal.UnresolvedEntityInsertActions – No unresolv
ed entity inserts that depended on [[com.jpa.test.INMEMORY_DB#id]]
TRACE: org.hibernate.action.internal.UnresolvedEntityInsertActions – No entity i
nsert actions have non-nullable, transient entity dependencies.
TRACE: org.springframework.transaction.interceptor.TransactionInterceptor – Comp
leting transaction for [com.jpa.test.TestImpl.test]
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Initiating transactio
n commit
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Committing JPA transa
ction on EntityManager [org.hibernate.ejb.EntityManagerImpl@8f64d]
DEBUG: org.hibernate.engine.transaction.spi.AbstractTransactionImpl – committing

The source code has a copy of log4j.xml that enables the appropriate log. But this method is not repeatable in the sense that it is hard to manually check the log messages everytime we change the configuration or add new code. That is what unit tests are for and Arquillian container tests deploy our code into a container and execute tests in the IDE. The developer does not have to deploy manually and test the code. All that is required is a good regression test suite.

Arquillian uses the dependency `arquillian-transaction-spring` to make the test method `transactional`.

There are some dependencies in the `pom.xml` as well as in this Arquillian test that are not needed or redundant but the required ones are there.

```package com.jpa.test;

import static org.junit.Assert.assertEquals;

import java.io.File;
import java.util.List;
import java.util.logging.Logger;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import javax.persistence.TypedQuery;
import javax.transaction.SystemException;

import org.jboss.arquillian.container.test.api.Deployment;
import org.jboss.arquillian.junit.Arquillian;
import org.jboss.arquillian.spring.integration.test.annotation.SpringConfiguration;
import org.jboss.arquillian.transaction.api.annotation.Transactional;
import org.jboss.shrinkwrap.api.ShrinkWrap;
import org.jboss.shrinkwrap.api.formatter.Formatters;
import org.jboss.shrinkwrap.api.spec.JavaArchive;
import org.jboss.shrinkwrap.api.spec.WebArchive;
import org.jboss.shrinkwrap.resolver.api.maven.Maven;
import org.junit.Test;
import org.junit.runner.RunWith;

@RunWith(Arquillian.class)
@SpringConfiguration("applicationContext.xml")
public class ShrinkWrappedJPATest {

private static Logger l = Logger.getLogger("jpa");

@PersistenceContext(unitName="testingSetup")
private EntityManager entityManager;

@Deployment
public static WebArchive createWebArchive() {

final WebArchive war=ShrinkWrap.create(WebArchive.class,"ShrinkWrapJPA.war");

JavaArchive jar = ShrinkWrap.create(JavaArchive.class)

l.info(war.toString(Formatters.VERBOSE));
return war;
}

private static void loadDependencies( final WebArchive war ){

File springorm = Maven.
resolver().
resolve("org.springframework:spring-orm:4.1.6.RELEASE")
.withoutTransitivity().asSingle(File.class);

File hibernate = Maven.
resolver().
resolve("org.hibernate:hibernate-core:4.1.7.FINAL")
.withoutTransitivity().asSingle(File.class);

File hibernate1 = Maven.
resolver().
resolve("org.hibernate:hibernate-entitymanager:4.1.7.FINAL")
.withoutTransitivity().asSingle(File.class);

File springexpression = Maven.
resolver().
resolve("org.springframework:spring-expression:4.2.0.RELEASE")
.withoutTransitivity().asSingle(File.class);

File springweb = Maven.
resolver().
resolve("org.springframework:spring-web:4.2.0.RELEASE")
.withoutTransitivity().asSingle(File.class);

File springcore = Maven.
resolver().
resolve("org.springframework:spring-core:4.1.6.RELEASE")
.withoutTransitivity().asSingle(File.class);

File springcontext = Maven.
resolver().
resolve("org.springframework:spring-context:4.1.6.RELEASE")
.withoutTransitivity().asSingle(File.class);

File springjdbc = Maven.
resolver().
resolve("org.springframework:spring-jdbc:4.1.6.RELEASE")
.withoutTransitivity().asSingle(File.class);

File springtx = Maven.
resolver().
resolve("org.springframework:spring-tx:4.1.6.RELEASE")
.withoutTransitivity().asSingle(File.class);

File hsqldb = Maven.
resolver().
resolve("org.hsqldb:hsqldb:2.3.1")
.withoutTransitivity().asSingle(File.class);

File dbcp = Maven.
resolver().
resolve("commons-dbcp:commons-dbcp:1.4")
.withoutTransitivity().asSingle(File.class);

File aopalliance = Maven.
resolver().
resolve("aopalliance:aopalliance:1.0")
.withoutTransitivity().asSingle(File.class);

File extensionspring = Maven.
resolver().
resolve("org.jboss.arquillian.extension:arquillian-service-deployer-spring-3:1.0.0.Beta3")
.withoutTransitivity().asSingle(File.class);

File springbeans = Maven.
resolver().
resolve("org.springframework:spring-beans:4.1.6.RELEASE")
.withoutTransitivity().asSingle(File.class);

File springaop = Maven.
resolver().
resolve("org.springframework:spring-aop:4.2.0.RELEASE")
.withoutTransitivity().asSingle(File.class);

File transactionapi = Maven.
resolver().
resolve("org.jboss.arquillian.extension:arquillian-transaction-api:1.0.1.Final")
.withoutTransitivity().asSingle(File.class);

File transactionimplbase = Maven.
resolver().
resolve("org.jboss.arquillian.extension:arquillian-transaction-impl-base:1.0.1.Final")
.withoutTransitivity().asSingle(File.class);

}

@Test
@Transactional(manager="transactionManager")

public void save() throws Exception, SystemException {

INMEMORY_DB a = new INMEMORY_DB();
a.setId("id");
a.setStreet("Street");
a.setArea("Area");
a.setState("State");
a.setCountry("LO");
a.setPin(1);
entityManager.persist( a );
}

TypedQuery<INMEMORY_DB> query =
entityManager.createQuery("SELECT c FROM INMEMORY_DB c", INMEMORY_DB.class);
List<INMEMORY_DB> results = query.getResultList();
return results.size();
}

}
```

Aesthetics of Matplotlib graphs

The graph shown in my earlier postis not clear and it looks wrong. I have improved it to some extent using this code. Matplotlib has many features more powerful than what I used earlier. I have commented the code used to annotate and display the actual points in the graph. I couldn’t properly draw the tick marks so that the red graph is clearly shown because the data range wasn’t easy to work with. There should be some feature that I still have not explored.

```
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np

def main():
gclog = pd.DataFrame(columns=['SecondsSinceLaunch',
'BeforeSize',
'AfterSize',
'TotalSize',
'RealTime'])
with open("D:\\performance\\data.txt", "r") as f:
for line in f:
strippeddata = line.split()
gclog = gclog.append(pd.DataFrame( [dict(SecondsSinceLaunch=strippeddata[0],
BeforeSize=strippeddata[1],
AfterSize=strippeddata[2],
TotalSize=strippeddata[3],
RealTime=strippeddata[4])] ),
ignore_index=True)
print gclog
#gclog.time = pd.to_datetime(gclog['SecondsSinceLaunch'], format='%Y-%m-%dT%H:%M:%S.%f')
gclog = gclog.convert_objects(convert_numeric=True)
fig, ax = plt.subplots(figsize=(17, 14), facecolor='white', edgecolor='white')
ax.axes.tick_params(labelcolor='darkblue', labelsize='10')
for axis, ticks in [(ax.get_xaxis(), np.arange(10, 8470, 100) ), (ax.get_yaxis(), np.arange(10, 9125, 300))]:
axis.set_ticks_position('none')
axis.set_ticks(ticks)
axis.label.set_color('#999999')
if False: axis.set_ticklabels([])
plt.grid(color='#999999', linewidth=1.0, linestyle='-')
plt.xticks(rotation=70)
map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])
ax.set_xlabel(r'AfterSize'), ax.set_ylabel(r'TotalSize')
ax.set_xlim(10, 8470, 100), ax.set_ylim(10, 9125, 300)
plt.plot(sorted(gclog.AfterSize),gclog.TotalSize,c="red")
#     for i,j in zip(sorted(gclog.AfterSize),gclog.TotalSize):
#         ax.annotate('(' + str(i) + ',' + str(j) + ')',xy=(i, j))

plt.show()
if __name__=="__main__":
main()
```

In fact after finishing the Principal Component Analysis section of a Machine Learning course I took recently I realized beautiful 3D graphs can be drawn.

Arquillian Unit Tests

I have contributed an article to DZone about Arquillian using this source code.