Practicing Predictive Analytics using “R”

I spent a Sunday on this code to answer some questions for a Coursera course. At this time this code is the norm in more than one such course. So I am just  building muscle memory. I type this code and look at the result and learn what I learnt earlier.

If I don’t remember how to solve it I search but the point is that I have to be constantly in touch with “R” as well the fundamentals. My day job doesn’t let me do this. The other option is a book on Machine Learning like the one by Tom Mitchell but that takes foreover.

setwd("~/Documents/PredictiveAnalytics")

library(dplyr)  
library(ggplot2)
library(rpart)
library(tree)
library(randomForest)
library(e1071)
library(caret)


seaflow <- read.csv(file="seaflow_21min.csv",head=TRUE)
final <-filter(seaflow, pop == "synecho")
print(nrow(final))
print( summary(seaflow))


print ( nrow(seaflow))

print( head(seaflow))

set.seed(555)
trainIndex <- createDataPartition( seaflow$file_id, p = 0.5, list=FALSE, times=1)
train <- seaflow[ trainIndex,]
test <- seaflow[ -trainIndex,]



print(mean(train$time))

p <- ggplot( seaflow, aes( pe, chl_small, color = pop)) + geom_point()
dev.new(width=15, height=14)
print(p)
ggsave("~/predictiveanalytics.png", width=4, height=4, dpi=100)
fol <- formula(pop ~ fsc_small + fsc_perp + fsc_big + pe + chl_big + chl_small)
model <- rpart(fol, method="class", data=train)
print(model)
#plot(model)
#text(model, use.n = TRUE, all=TRUE, cex=0.9)

testprediction <- predict( model, newdata=test, type="class")
comparisonofpredictions <- testprediction == test$pop
accuracy <- sum(comparisonofpredictions) / length(comparisonofpredictions)

print( accuracy )

randomforestmodel <- randomForest( fol, data = train)
print(randomforestmodel)

testpredictionusingrandomforest <- predict( randomforestmodel, newdata=test, type="class")
comparisonofpredictions <- testpredictionusingrandomforest == test$pop
accuracy <- sum(comparisonofpredictions) / length(comparisonofpredictions)
print( accuracy )

print(importance(randomforestmodel))

svmmodel <- svm( fol, data = train)

testpredictionusingsvm <- predict( svmmodel, newdata=test, type="class")
comparisonofpredictions <- testpredictionusingsvm == test$pop
accuracy <- sum(comparisonofpredictions) / length(comparisonofpredictions)
print( accuracy )

predictiveanalytics

StatET for R

I have probably done this a hundred times but still recording these steps is useful.
So apart from installation of R and Eclipse and the StatEt plugin these are the other steps to use R in eclipse.

PATH
C:\Program Files\Java\jdk1.7.0_75\jre\bin

JAVA_HOME
C:\Program Files\Java\jdk1.7.0_75

> install.packages(c(“rj”, “rj.gd”), repos=”http://download.walware.de/rj-2.0&#8243;)
trying URL ‘http://download.walware.de/rj-2.0/bin/windows/contrib/3.2/rj_2.0.4-2
.zip’
Content type ‘application/zip’ length 378433 bytes (369 KB)
downloaded 369 KB

trying URL ‘http://download.walware.de/rj-2.0/bin/windows/contrib/3.2/rj.gd_2.0.
0-1.zip’
Content type ‘application/zip’ length 93519 bytes (91 KB)
downloaded 91 KB

package ‘rj’ successfully unpacked and MD5 sums checked
package ‘rj.gd’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\476458\AppData\Local\Temp\RtmpCsk978\downloaded_packages

Cubie Board 4 unboxing

As part of my effort to code OCaml and create a Unikernel I was introduced to https://mirage.io/ and that in turn lead me to http://cubieboard.org/. I still don’t know if Cubie board 4 can be used to do what they have done using the earlier versions of the hardware. I have to try

Microsoft: Data Science and Machine Learning Essentials

edxAfter completing this edX course successfully I identified these questions which I answered wrongly. In some cases I selected more than the required options due to oversight.

I have marked the likely answers.

I need a longer article to explain what I learnt which I plan to write soon.

You have amassed a large volume of customer data, and want to determine if it is possible to identify distinct categories of customer based on similar characteristics.

What kind of predictive model should you create?

    1. Regression
    2. Clustering
    3. Recommender
    4. Classification

You discover that there are missing values for an unordered numeric column in your data.
Which three approaches can you consider using to treat the missing values?

    1. Substitute the text “None”.
    2. Forward fill or back fill the value.
    3. Remove rows in which the value is missing.
    4. Interpolate a value to replace the missing value.
    5. Substitute the numeral 0

When assessing the residuals of a regression model you observe the following:

Residuals exhibit a persistent structure and are not randomly distributed with respect to values of the label or the features.
The Q-Q normal plots of the residuals show significant curvature and the presence of outliers.
Given these results, which two of the following things should you try to improve the model?

    1. Cross validate the model to ensure that it will generalize properly.
    2. Try a different class of regression model that might better fit the problem should be tried.
    3. Create some engineered features with behaviors more closely tracking the values of the label.
    4. Add a Sweep Parameters module with the Metric for measuring performance for classification property set to Accuracy.

You create an experiment that uses a Train Matchbox Recommender module to train a recommendation model, and add a Score Matchbox Recommender module to generate a prediction. You want to use the model in a music streaming service to recommend songs for the currently logged in user.Which recommender prediction kind should you configure the Score Matchbox

Recommender module to use?

    1. Item Recommendation
    2. Related Items
    3. Rating Prediction
    4. Related Users

While exploring a dataset you discover a nonlinear relationship between certain features and the label.Which two of the following feature engineering steps should you try before training a supervised machine learning model?


1. Ensure the features are linearly independent.
2. Compute new features based on polynomial values of the original features.
3. Compute mathematical combinations of the label and other features.
4. Compute new features based on logarithms or exponentiation of these original features.

Which two of the following approaches can you use to determine which features to prune in an Azure ML experiment?

    1. Use the Permutation Feature Importance model to identify features of near-zero importance.
    2. Use the Cross Validation module to identify folds which indicate the model does not generalize well.
    3. Prune features one at a time to find features which reduce model performance or have no impact on model performance as measured with the Evaluate Model module.
    4. Use the Split module to create training, test and evaluation data sub-sets to evaluate model performance.

Gradient Descent

I ported the Gradient Descent code from Octave to Python. The base Octave code is the one from Andrew Ng’s Machine Learning MOOC.

I mistakenly believed that the Octave code for matrix multiplication will directly translate in Python.

The matrices are these.
Screen Shot 2015-10-25 at 9.27.09 pm

But the Octave code is this

Octave code

  theta = theta - ( (  alpha * ( (( theta' * X' )' - y)' * X ))/length(y) )'

and the Python code is this.

Python

def gradientDescent( X,
                     y,
                     theta,
                     alpha = 0.01,
                     num_iters = 1500):

    r,c = X.shape
    
    for iter in range( 1, num_iters ):
        theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )
    return theta

This line is not a direct transalation.

        theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )

But only the above Python code gives me the correct theta that matches the value given by the Octave code.

Screen Shot 2015-10-25 at 9.32.53 pm

Linear Regression

gradientdescent

But the gradient descent also does not give me the correct value after a certain number of iterations. But the cost value is similar.

Gradient Descent from Octave Code that converges

Octave-Contour

Minimization of cost

Initial cost is 640.125590
J = 656.25
Initial cost is 656.250475
J = 672.58
Initial cost is 672.583001
J = 689.12
Initial cost is 689.123170
J = 705.87
Initial cost is 705.870980
J = 722.83
Initial cost is 722.826433
J = 739.99
Initial cost is 739.989527

Gradient Descent from my Python Code that does not converge to the optimal value

gradientdescent1

Minimization of cost

635.81837438
651.963633303
668.316534159
684.877076945
701.645261664
718.621088313
735.804556895

Azure Machine Learning

The AzureML Studio user interface is slick, very responsive and adopts a workflow supporting both R and Python scripts. There is a free account available with this caveat but that did not hamper my efforts to test some simple flows.

Note: Your free-tier Azure ML account allows you unlimited access, with some reduced capabilities compared to a full Microsoft Azure subscription. Your experiments will only run at low priority on a single processor core. As a result, you will experience some longer wait times. However, you have full access to all features of Azure ML.

The graph visualizations are very spiffy too. I am yet to finish the data cleansing aspects and use the really interesting ML algorithms.

Azure

Principal Component Analysis

This is about what I think I understood about Principal Component Analysis. I will update this blog post later.

The code is in github and it works but I think the eigen values could be wrong. I have to test it further.

These are the two main functions.


    """Compute the covariance matrix for a given dataset.
    """
def estimateCovariance( data ):
    print data
    mean = getmean( data )
    print mean
    dataZeroMean = map(lambda x : x - mean, data )
    print dataZeroMean
    covar = map( lambda x : np.outer(x,x) , dataZeroMean )
    print getmean( covar ) 
    return getmean( covar )

    """Computes the top `k` principal components, corresponding scores, and all eigenvalues.
    """
def pca(data, k=2):
    
    d = estimateCovariance(  data )
    
    eigVals, eigVecs = eigh(d)

    validate( eigVals, eigVecs )
    inds = np.argsort(eigVals)[::-1]
    topComponent = eigVecs[:,inds[:k]]
    print '\nTop Component: \n{0}'.format(topComponent)
    
    correlatedDataScores = map(lambda x : np.dot( x ,topComponent), data )
    print ('\nScores : \n{0}'
       .format('\n'.join(map(str, correlatedDataScores))))
    print '\n eigenvalues: \n{0}'.format(eigVals[inds])
    return topComponent,correlatedDataScores,eigVals[inds]

JPA and Spring @Transactional and JBoss Arquillian

JBoss Arquillian is a test framework that one can use to execute tests in the IDE as part of the development process. The key parts are the deployment API and container adapters that enable us to deploy, tests that execute inside a container, automatically and repeatedly.

I have written about Arquillian here.
In this post I will show how a simple Arquillian test for a JPA transaction avoids countless wasted hours. Actually I spent a few hours trying to find out why enabling the wrong Transaction Manager produces log lines almost similar to the section below and misleads one into thinking that transactions are indeed in effect. It is the wrong transaction manager and no rows are actually committed to the Database. But the logs do show some messages that indicate data is committed.

This is the correct set of log messages that show that JpaTransactionManager takes effect.

DEBUG: org.springframework.transaction.annotation.AnnotationTransactionAttribute
Source – Adding transactional method ‘TestImpl.test’ with attribute: PROPAGATION
_REQUIRED,ISOLATION_DEFAULT; ”
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Creating new transact
ion with name [com.jpa.test.TestImpl.test]: PROPAGATION_REQUIRED,ISOLATION_DEFAU
LT; ”
DEBUG: org.hibernate.internal.SessionImpl – Opened session at timestamp: 1442144
9169
TRACE: org.hibernate.internal.SessionImpl – Setting flush mode to: AUTO
TRACE: org.hibernate.internal.SessionImpl – Setting cache mode to: NORMAL
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Opened new EntityMana
ger [org.hibernate.ejb.EntityManagerImpl@8f64d] for JPA transaction
DEBUG: org.hibernate.engine.transaction.spi.AbstractTransactionImpl – begin
DEBUG: org.hibernate.engine.jdbc.internal.LogicalConnectionImpl – Obtaining JDBC
connection
DEBUG: org.springframework.jdbc.datasource.SimpleDriverDataSource – Creating new
JDBC Driver Connection to [jdbc:hsqldb:mem:dataSource]
DEBUG: org.hibernate.engine.jdbc.internal.LogicalConnectionImpl – Obtained JDBC
connection
DEBUG: org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction – initial
autocommit status: true
DEBUG: org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction – disablin
g autocommit
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Exposing JPA transact
ion as JDBC transaction [org.springframework.orm.jpa.vendor.HibernateJpaDialect$
HibernateConnectionHandle@423d24]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Bound value [org.springframework.jdbc.datasource.ConnectionHolder@805780] for
key [org.springframework.jdbc.datasource.SimpleDriverDataSource@faa27c] to thre
ad [http-nio-8080-exec-5]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Bound value [org.springframework.orm.jpa.EntityManagerHolder@7a72fc] for key
[org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean@1e0cc0a] to
thread [http-nio-8080-exec-5]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Initializing transaction synchronization
TRACE: org.springframework.transaction.interceptor.TransactionInterceptor – Gett
ing transaction for [com.jpa.test.TestImpl.test]
INFO : jpa – TransactionSynchronizationManager.isActualTransactionActive()true
INFO : jpa – INMEMORY_DB [id=id, street=Street, area=Area, state=State, country
=LO, pin=1]
TRACE: org.springframework.transaction.support.TransactionSynchronizationManager
– Retrieved value [org.springframework.orm.jpa.EntityManagerHolder@7a72fc] for
key [org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean@1e0cc0a]
bound to thread [http-nio-8080-exec-5]
TRACE: org.hibernate.engine.spi.IdentifierValue – ID unsaved-value strategy UNDE
FINED
TRACE: org.hibernate.event.internal.AbstractSaveEventListener – Transient instan
ce of: com.jpa.test.INMEMORY_DB
TRACE: org.hibernate.event.internal.DefaultPersistEventListener – Saving transie
nt instance
DEBUG: org.hibernate.event.internal.AbstractSaveEventListener – Generated identi
fier: id, using strategy: org.hibernate.id.Assigned
TRACE: org.hibernate.event.internal.AbstractSaveEventListener – Saving [com.jpa.
test.INMEMORY_DB#id]
TRACE: org.hibernate.engine.spi.ActionQueue – Adding an EntityInsertAction for [
com.jpa.test.INMEMORY_DB] object
TRACE: org.hibernate.engine.spi.ActionQueue – Adding insert with no non-nullable
, transient entities: [EntityInsertAction[com.jpa.test.INMEMORY_DB#id]]
TRACE: org.hibernate.engine.spi.ActionQueue – Adding resolved non-early insert a
ction.
TRACE: org.hibernate.action.internal.UnresolvedEntityInsertActions – No unresolv
ed entity inserts that depended on [[com.jpa.test.INMEMORY_DB#id]]
TRACE: org.hibernate.action.internal.UnresolvedEntityInsertActions – No entity i
nsert actions have non-nullable, transient entity dependencies.
TRACE: org.springframework.transaction.interceptor.TransactionInterceptor – Comp
leting transaction for [com.jpa.test.TestImpl.test]
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Initiating transactio
n commit
DEBUG: org.springframework.orm.jpa.JpaTransactionManager – Committing JPA transa
ction on EntityManager [org.hibernate.ejb.EntityManagerImpl@8f64d]
DEBUG: org.hibernate.engine.transaction.spi.AbstractTransactionImpl – committing

The source code has a copy of log4j.xml that enables the appropriate log. But this method is not repeatable in the sense that it is hard to manually check the log messages everytime we change the configuration or add new code. That is what unit tests are for and Arquillian container tests deploy our code into a container and execute tests in the IDE. The developer does not have to deploy manually and test the code. All that is required is a good regression test suite.

Arquillian uses the dependency arquillian-transaction-spring to make the test method transactional.

There are some dependencies in the pom.xml as well as in this Arquillian test that are not needed or redundant but the required ones are there.

package com.jpa.test;

import static org.junit.Assert.assertEquals;

import java.io.File;
import java.util.List;
import java.util.logging.Logger;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import javax.persistence.TypedQuery;
import javax.transaction.SystemException;

import org.jboss.arquillian.container.test.api.Deployment;
import org.jboss.arquillian.junit.Arquillian;
import org.jboss.arquillian.spring.integration.test.annotation.SpringConfiguration;
import org.jboss.arquillian.transaction.api.annotation.Transactional;
import org.jboss.shrinkwrap.api.ShrinkWrap;
import org.jboss.shrinkwrap.api.formatter.Formatters;
import org.jboss.shrinkwrap.api.spec.JavaArchive;
import org.jboss.shrinkwrap.api.spec.WebArchive;
import org.jboss.shrinkwrap.resolver.api.maven.Maven;
import org.junit.Test;
import org.junit.runner.RunWith;


@RunWith(Arquillian.class)
@SpringConfiguration("applicationContext.xml")
public class ShrinkWrappedJPATest {

	private static Logger l = Logger.getLogger("jpa");
		
	@PersistenceContext(unitName="testingSetup")
	private EntityManager entityManager;

    @Deployment
    public static WebArchive createWebArchive() {
  
    	final WebArchive war=ShrinkWrap.create(WebArchive.class,"ShrinkWrapJPA.war");
    	  
        JavaArchive jar = ShrinkWrap.create(JavaArchive.class)
                				.addPackage("com.jpa.test");
 

        war.addAsLibrary(jar);
    	war.addAsResource("applicationContext.xml");
    	war.addAsResource("arquillian.xml");
    	war.addAsResource("log4j.xml");
    	war.addAsResource("schema.sql");
    	war.addAsResource("test-data.sql");
    	war.addAsResource("log4j.xml");
    	war.addAsResource("persistence.xml", "META-INF/persistence.xml");
    	loadDependencies( war );
 
    	l.info(war.toString(Formatters.VERBOSE));
    	return war;
    }

        
    
    
    private static void loadDependencies( final WebArchive war ){
    	
        File springorm = Maven.
				resolver().
					resolve("org.springframework:spring-orm:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springorm);

        File hibernate = Maven.
 				resolver().
 					resolve("org.hibernate:hibernate-core:4.1.7.FINAL")
 						.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(hibernate);

 
	    File hibernate1 = Maven.
					resolver().
						resolve("org.hibernate:hibernate-entitymanager:4.1.7.FINAL")
							.withoutTransitivity().asSingle(File.class);
	
	     war.addAsLibraries(hibernate1);

        File springexpression = Maven.
				resolver().
					resolve("org.springframework:spring-expression:4.2.0.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springexpression);

        File springweb = Maven.
				resolver().
					resolve("org.springframework:spring-web:4.2.0.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springweb);

        File springcore = Maven.
				resolver().
					resolve("org.springframework:spring-core:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springcore);
        
        File springcontext = Maven.
				resolver().
					resolve("org.springframework:spring-context:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springcontext);

        File springjdbc = Maven.
				resolver().
					resolve("org.springframework:spring-jdbc:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springjdbc);
        File springtx = Maven.
				resolver().
					resolve("org.springframework:spring-tx:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springtx);
        File hsqldb = Maven.
 				resolver().
 					resolve("org.hsqldb:hsqldb:2.3.1")
 						.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(hsqldb);
         File dbcp = Maven.
 				resolver().
 					resolve("commons-dbcp:commons-dbcp:1.4")
 						.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(dbcp);
       File aopalliance = Maven.
				resolver().
					resolve("aopalliance:aopalliance:1.0")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(aopalliance);
        File extensionspring = Maven.
				resolver().
					resolve("org.jboss.arquillian.extension:arquillian-service-deployer-spring-3:1.0.0.Beta3")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(extensionspring);


         File springbeans = Maven.
				resolver().
					resolve("org.springframework:spring-beans:4.1.6.RELEASE")
						.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springbeans);

        File springaop = Maven.
				resolver().
					resolve("org.springframework:spring-aop:4.2.0.RELEASE")
					.withoutTransitivity().asSingle(File.class);

        war.addAsLibraries(springaop);


         File transactionapi = Maven.
 				resolver().
 					resolve("org.jboss.arquillian.extension:arquillian-transaction-api:1.0.1.Final")
 					.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(transactionapi);
         File transactionimplbase = Maven.
 				resolver().
 					resolve("org.jboss.arquillian.extension:arquillian-transaction-impl-base:1.0.1.Final")
 					.withoutTransitivity().asSingle(File.class);

         war.addAsLibraries(transactionimplbase);

    }
    
    @Test
    @Transactional(manager="transactionManager")
    
	public void save() throws Exception, SystemException {
 

        INMEMORY_DB a = new INMEMORY_DB();
		a.setId("id");
		a.setStreet("Street");
		a.setArea("Area");
		a.setState("State");
		a.setCountry("LO");
		a.setPin(1);
		entityManager.persist( a );
  		assertEquals(getAddressCount(), 2);
	}

 
	public int getAddressCount(){
		TypedQuery<INMEMORY_DB> query =
				entityManager.createQuery("SELECT c FROM INMEMORY_DB c", INMEMORY_DB.class);
		List<INMEMORY_DB> results = query.getResultList();	
		return results.size();
	}

}

Aesthetics of Matplotlib graphs

matplotlib
The graph shown in my earlier postis not clear and it looks wrong. I have improved it to some extent using this code. Matplotlib has many features more powerful than what I used earlier. I have commented the code used to annotate and display the actual points in the graph. I couldn’t properly draw the tick marks so that the red graph is clearly shown because the data range wasn’t easy to work with. There should be some feature that I still have not explored.


import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np

def main():
    gclog = pd.DataFrame(columns=['SecondsSinceLaunch',
                                   'BeforeSize',
                                   'AfterSize',
                                   'TotalSize',
                                   'RealTime'])
    with open("D:\\performance\\data.txt", "r") as f:
        for line in f:
            strippeddata = line.split()
            gclog = gclog.append(pd.DataFrame( [dict(SecondsSinceLaunch=strippeddata[0],
                                                     BeforeSize=strippeddata[1],
                                                     AfterSize=strippeddata[2],
                                                     TotalSize=strippeddata[3],
                                                     RealTime=strippeddata[4])] ),
                                               ignore_index=True)
    print gclog
    #gclog.time = pd.to_datetime(gclog['SecondsSinceLaunch'], format='%Y-%m-%dT%H:%M:%S.%f')
    gclog = gclog.convert_objects(convert_numeric=True)
    fig, ax = plt.subplots(figsize=(17, 14), facecolor='white', edgecolor='white')
    ax.axes.tick_params(labelcolor='darkblue', labelsize='10')
    for axis, ticks in [(ax.get_xaxis(), np.arange(10, 8470, 100) ), (ax.get_yaxis(), np.arange(10, 9125, 300))]:
        axis.set_ticks_position('none')
        axis.set_ticks(ticks)
        axis.label.set_color('#999999')
        if False: axis.set_ticklabels([])
    plt.grid(color='#999999', linewidth=1.0, linestyle='-')
    plt.xticks(rotation=70)
    plt.gcf().subplots_adjust(bottom=0.15)
    map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])
    ax.set_xlabel(r'AfterSize'), ax.set_ylabel(r'TotalSize')
    ax.set_xlim(10, 8470, 100), ax.set_ylim(10, 9125, 300)    
    plt.plot(sorted(gclog.AfterSize),gclog.TotalSize,c="red")
#     for i,j in zip(sorted(gclog.AfterSize),gclog.TotalSize):
#         ax.annotate('(' + str(i) + ',' + str(j) + ')',xy=(i, j))
    
    plt.show()
if __name__=="__main__":
    main()

figure_1

In fact after finishing the Principal Component Analysis section of a Machine Learning course I took recently I realized beautiful 3D graphs can be drawn.

cube

Arquillian Unit Tests

arquillian_logoI have contributed an article to DZone about Arquillian using this source code.