R Reference classes

A pure OO approach and a functional representation of it are at loggerheads. That is evident when one tries to adopt an OO approach using a powerful functional language. That is my personal opinion.

R has many Object-oriented features built into it.

R has three object oriented (OO) systems: [[S3]], [[S4]] and [[R5]].

Reference classes are one such feature.

 OO

Let us consider this data. The id is that of a Subject who is in
a room where monitoring equipment gathers some data. There are several visits to gather this data.

id visit room value timepoint
14 0 bedroom 6 53
14 0 bedroom 6 54
15 0 bedroom 2.75 56

The idea that this code is based on is from Martin Fowler’s book Analysis Patterns Reusable Object Models. The chapter on Observations and Measurements has a diagram roughly equivalent to the
one shown at the top.

The code is lightly tested several times but without unit tests.

library(plyr)
library(dplyr)
library(purrr)

CompoundUnit <- setRefClass("CompoundUnit",
                       fields = list(micrograms = 'numeric',
                                     cubicmeter = 'numeric'))
Location <- setRefClass("Location",
                       fields = list( room = 'character'),
                       methods=list(getlocation = function(){
                                        room
                                    },
                                    summary = function(){
                                        paste('Room [' , room , ']')
                      }))
library(objectProperties)
# An Enum which could have behaviour associated with it.
# This is convoluted but the only way I know to represent constants and validate them.
#
###############################################################################

MeasurementVisitEnum.gen <- setSingleEnum("MeasurementVisit",levels = c('0', '1', '2'))
par.gen <- setRefClass("Visit",
properties(fields = list(visit = "MeasurementVisitSingleEnum"),
prototype = list(visit =
new("MeasurementVisitSingleEnum",
'0'))))

What is the significance of this convoluted code ?

It restricts the values that be set to 0.1 and 2. It is like the Java enum

But this is not strictly a requirement here. It is just that there is a facility to identify erroneous data if we need it.

> MeasurementVisitEnum.gen <- setSingleEnum("MeasurementVisit",levels = c('0', '1', '2'))
> par.gen <- setRefClass("Visit",properties(fields = list(visit = "MeasurementVisitSingleEnum"),prototype = list(visit = new("MeasurementVisitSingleEnum",'0'))))
> visits <- par.gen$new()
'MeasurementVisitSingleEnum'
> visits$visit <- as.character(0)
> visits$visit <- as.character(1)
> visits$visit <- as.character(2)
> visits$visit <- as.character(3)
Error in (function (val)  : 
  Attempt to set invalid value on 'visit': value '3' does not belong to level set
( 0, 1, 2 )


TimePoint <- setRefClass("TimePoint",
		fields = list(time = 'numeric'))
Quantity <- setRefClass("Quantity",
		fields = list(amount = "numeric",
				units = CompoundUnit))

Measurement encapsulates the quantity, the time point and the visit number. So, for example, during visit 0, at this time point the quantity was observed. This type of encapsulation in the true spirit of OO has its
disadvantages as we will see later.

Measurement <- setRefClass("Measurement",
		fields = list(
				quantity = "Quantity",
				timepoint = "TimePoint",
				visit = "Visit"),
		methods=list(getvisit = function(){
					visit$visit
				},getquantity = function(){
					quantity
				})
)
Subject <- setRefClass("Subject",
		fields = list( id = "numeric",
				measurement = "Measurement",
				location = "Location"),
		methods=list(getmeasurement = function()
				{
					measurement
				},
				getid = function()
				{
					id
				},
				getlocation = function()
				{
					location
				},
				summary = function()#Implement other summary methods in appropriate objects as per their responsibilities
				{
					paste("Subject summary ID [",id,"] Location [",location$summary(),"]")
				},show = function(){
					cat("Subject summary ID [",id,"] Location [",location$summary(),"]\n")
				})
)

LongitudinalDatum is the class LongitudinalData inherits from. This inheritance is shown as an example. Not all methods that should belong in the super class are properly added. There are many methods in the sub class that can be moved a level up.

subsummary in the super class can be called from the sub class. The line if( subject(x) == id){ in the sub class LongitudinalData calls this super class method.

LongitudinalDatum <- setRefClass("LongitudinalDatum",
		methods=list(subject = function(sub){
						sub$getid()
					},subsummary = function(sub){
						if(is.character(sub) && sub == 'NA'){
							sub
						}else{
							sub$summary()		
						}
					}
				)
)

setwd("D:/eclipse/workspace/pollutantAnalysis")
library(plyr)
library(dplyr)

LongitudinalData <- setRefClass("LongitudinalData",
contains = "LongitudinalDatum",
fields = list(measurements = "list"),
methods=list(
getmeasurements = function(){
measurements
},
read = function()
       {
             data <- read.csv("MIE - Copy.csv", header= TRUE) data %>% 
             select(visit,room,id,timepoint,value) -> datum
             make_LD( datum )
       },
       make_LD = function( datum )
       {

             data <- read.csv("MIE - Copy.csv", header= TRUE) data %>% 
             select(visit,room,id,timepoint,value) -> datum

             measurements <<- list()
             load(datum)

        },load = function( df ){
             by(df, 1:nrow(df), function(row) {
             visits <- par.gen$new()
             visits$visit <- as.character(row$visit)

             u <- CompoundUnit$new( micrograms = 1,
             cubicmeter = 1 )

             q <- Quantity$new(amount = row$value,
                               units = u )

             t <- TimePoint$new(time = row$timepoint)

             m <- Measurement$new(
                            quantity = q,
                            timepoint = t,
                            visit = visits)

             l <- Location$new( room = as.character(row$room))

             s <- Subject$new( id = row$id,
                               measurement = m,
                               location = l)
             measurements <<- c( measurements, s )

             })

           },
              getmeasurementslength = function(){
                     length(measurements)
           },
           findsubject = function( id ){
              result <- 'NA' measurements %>% map(., function(x) {
              if( subject(x) == id){
                  result <<- x # Warning message is benign for this example. result
                  #cannot be a class state. It is really local.
              }
           }
          )
          result

          },
          visit = function( sub,v ){
              measurementsvisit <- c() if(is.character(sub) && sub == 'NA'){ 
                measurementsvisit }else{ measurements %>% map(., function(x) {
                m <- x$getmeasurement()
                  if (m$getvisit() == v && x$getid() == sub$getid() ){
                    measurementsvisit <<- c(measurementsvisit,x)
                  }
                }

              )

          list(visit = measurementsvisit )
          }
          },
          room = function( t, room ){
               if( length( t) == 0 ){
                 c('NA')
               }else{
                 measurementsvisitroom <- c() t$visit %>% map(., function(x) {
                  if( x$getlocation()$getlocation() == room )
                   measurementsvisitroom <<- c(measurementsvisitroom,x)
                  })
                  if( length( measurementsvisitroom ) == 0 ){
                     c('NA')
                  }else{
                     measurementsvisitroom
                  }
               }
           },
           summaries = function( subjects ){
              summaries <- c() if(is.character(subjects) && subjects == 'NA'){ subjects }
              else{ 
                measurements %>% map(., function(x) {
                  subjects %>% map(., function(y) {
                    if (x$getid() == y$getid() ){
                      m <<- x$getmeasurement()
                           summaries <<- c(summaries,m$getquantity()$amount) 
                    } }) }) 
           summaries %>% summary
           }
           },subjectsummary = function( subject ){
                filteredmeasurements <-
                keep(measurements, function(x){
                   x$getid() == subject$getid()
                })
                groupedmeasurements <- filteredmeasurements %>% lapply(function(x){
                   m <<- x$getmeasurement() as.data.frame(list('visit'=m$getvisit(), 
                    'location'=x$getlocation()$getlocation(), 'amount'=m$getquantity()$amount)) }) %>% rbind_all()
                dataColumns <- c('amount')

                ddply(groupedmeasurements,c('visit','location'),function(x) 
                      colSums(x[dataColumns]))
                }

                )
           )

How does this work ?

The data is loaded into an object hierarchy in the load function. I did observe that it was slow most probably because my Eclipse StatET for R setup needs more memory.

Since the methods are all encapsulated by the class I am using the reference to call methods. The result of findsubject is passed to subjectsummary because I am piping the result of one method to the next.

ld <- LongitudinalData$new()

out <- ld$findsubject(14) %>% ld$subjectsummary()
print(out)

So here the result of findsubject(14) is passed as the first parameter when visit(0) is called. 0 becomes the second parameter.

out <- ld$findsubject(14) %>% ld$visit(0) %>% ld$room("bedroom")

The final result from this pipeline is whatever is returned by the last method room("bedroom").

I would like to reassert that this is just one way of combining multiple methods using Reference classes. There are much more powerful functional approaches that don’t require this many of line of code. This example illustrates an Object-oriented approach.

Flattening the Reference classes

The OO hierarchy here does not seem to be malleable when used with some R packagea like dplyr. Try as I may, I cannot coerce the Reference classes into a R data frame and pipe it through stages using dplyr. Remember I want to use functions like map and filter to get the data out of these reference clasees in a shape that I want.

So I abandon my OO approach and flatten the objects and create a data frame. Now I get back the data in the shape I want.

groupedmeasurements <- filteredmeasurements %>%
lapply(
function(x){
                m <<- x$getmeasurement()
                as.data.frame(list( 'visit'=m$getvisit(), 
                                    'location'=x$getlocation()$getlocation(), 
                                    'amount'=m$getquantity()$amount)) }) %>% rbind_all()

This is how one gets the following output.

out <- ld$findsubject(14) %>% ld$subjectsummary()
print(out)
visit location amount
0 bedroom 12.00
0 dining room 2.75
0 living room 2.75
0 room 5.50
0 tv room 2.75
1 room 2.75

Conclusion

This exercise has not helped me determine in which context R’s Reference classes are specifically used. The other OO systems like S3 and S4 may be more useful but this article is about RC’s. Why should I flatten my object hierarchy to reshape my data in a convenient way ? There may be specialized R packages that use the OO approach and expose API’s but I am not aware of them. So at this time I understand that there is a dichotomy between RC’s and the powerful functional approach. I personally like to use the functional programming paradigm when dealing with data.

Joy of OCaml

screenshot-from-2016-12-03-19-14-34

I have spent most of last week with my Emacs editor and the OCaml development environment. Since I have some OCaml code to complete I will add more details soon.

Suffice it to say that this setup taxed me so much. OPAM does not seem to install easily in Windows. As is my wont in such cases I started with Cygwin and after two days switched to a Ubuntu VM. I didn’t think I was gaining much by reporting Cygwin permission issues to owners of OPAM Windows installers.

Emacs company mode for autocompletion

The toolchain includes company as well as Merlin
and Tuareg.

screenshot-from-2016-12-03-19-00-53

Utop is a toplevel for OCaml

screenshot-from-2016-12-03-19-10-58

Emacs elisp

It looks like this at this time and I use Gist because WordPress does not support Lisp or OCaml or Haskell yet. Filed a support ticket.

More about OCaml code later. This creates an associative list of tuples containing characters and the number of times they occur in a String. MultiSet is a module that is not shown either but as I mentioned I have more to write about this wonderful programming language.

Polyglot programming using Jenkins

jenkins Facility for languages develops when one does not squander existing opportunities to code. That is what I think.

Jenkins, the CI enabler supports a few languages like Python and Groovy. The Python package I used to make the Rest API calls is ‘Python Jenkins’.It is interesting to note that run_script executes Groovy code.

I didn’t test it exactly when the Unix server runs out of disk space but assumed the text from the console output will match.Moreover the encryption routine works as expected but the decryption function doesn’t work. It seems that since I call the Rest API there could be a encryption/decryption key mismatch.


'''
Created on Oct 12, 2016

@author: Mohan Radhakrishnan

This python module gets the console output of the latest
build and if the text 'No space left on device' is found in
the output it sends a mail.
I've taken liberties with the 'functional paradigm'

'''
import smtplib
import jenkins
import os
def main():

overrideenvironmentvariables()

server = jenkins.Jenkins('http://localhost:8080', username='Mohan', password='Welcome1')

notifydisaster(server)

'''
Notify
'''
def notifydisaster( server ):
print( getconsoleoutput(server) )
name,buildnumber,consoleoutput = getconsoleoutput(server)
if (consoleoutput.find("Caused by: java.io.IOException: No space left on device") != -1):
print("Caused by: java.io.IOException: No space left on device")
sendmail( name,buildnumber )

'''
Notify
Password Encryption/decryption code has to be tested and used
'''
def sendmail(name,buildnumber):
smtp = smtplib.SMTP('smtp.gmail.com', 587)
smtp.ehlo()
smtp.starttls()
smtp.login("x.y@z.com","Password")
smtp.sendmail('x.y@z.com', 'x.y@z.com', 'Subject: No space left on device\n \
Job ' + name + ' Build ' + str(buildnumber) + ' fails due to lack of disk space')

'''
Get the console output of the particular
Job's build
'''
def getconsoleoutput(server):
information = getJobName(server)
if information:
return information[ 0 ]['name'] ,getlastjobDetails(server),server.get_build_console_output(information[ 0 ]['name'], getlastjobDetails(server))

'''
Get Job and other details
and filter the Job we are interested in
'''
def getJobName(server):
jobs = server.get_all_jobs(0)
filtercriterion = ['CITestPipeline']

return list(filter( lambda d: d['fullname'] in filtercriterion, jobs))

'''
Get Job and other details
Return '0' as the build number assuming
it signifies that there is no such build number
'''

def getlastjobDetails(server):
information = getJobName(server)
if information:
last_build_number = server.get_job_info(information[ 0 ]['name'])['lastCompletedBuild']['number']
return last_build_number
else:
return 0

'''
Attempt here to encrypt Passwords using Jenkins' key
Not tested properly
'''
def encrypt(server ):
value = server.run_script("""
secret = hudson.util.Secret.fromString("Password")
println secret.getEncryptedValue()
println secret.getPlainText()
""")
print (value)

def decrypt(server ):
decryptedvalue = server.run_script("""
secret = hudson.util.Secret.fromString("aiJREkuBjWHX9UWIyhEzwnnAJReuZnQVEtUr0KgvXKg")
println hudson.util.Secret.toString(secret)
""")
print (decryptedvalue)
return decryptedvalue
'''
Override this proxy setting as we don't
need it and it causes an error.
'''
def overrideenvironmentvariables():
os.environ["HTTP_PROXY"] = ''

if __name__=="__main__":
main()

Spacemacs

I will update this post soon as my day job leaves little time for fun aspects like this.

Spacemacs’ new Haskell layer is what I like now eventhough the Haskell editor setup is not easy for the novice.

After installing Spacemacs these are the basic steps I followed.

Added the Haskell layer

;; —————————————————————-
;; Example of useful layers you may want to use right away.
;; Uncomment some layer names and press (Vim style) or
;; (Emacs style) to install them.
;; —————————————————————-
;; auto-completion
;; better-defaults
emacs-lisp
(haskell .variables haskell-enable-shm-support t)
;; git
;; markdown
;; org
;; (shell :variables
;; shell-default-height 30
;; shell-default-position ‘bottom)
;; spell-checking
;; syntax-checking
;; version-control

Added this to .spacemacs

(defun dotspacemacs/user-init ()
“Initialization function for user code.
It is called immediately after `dotspacemacs/init’, before layer configuration
executes.
This function is mostly useful for variables that need to be set
before packages are loaded. If you are unsure, you should try in setting them in
`dotspacemacs/user-config’ first.”
(add-hook ‘haskell-mode-hook ‘turn-on-haskell-indentation)
(add-to-list ‘exec-path “C:/Users/476458/AppData/Roaming/local/bin/”)
)

C:/Users/476458/AppData/Roaming/local/bin/ contains other tools installed by Stack.

Stack is a cross-platform program for developing Haskell projects. It is aimed at Haskellers both new and experienced.

Spacemacs

Suggested by spacemacs Reddit group

(defun dotspacemacs/user-config ()
“Configuration function for user code.
This function is called at the very end of Spacemacs initialization after
layers configuration.
This is the place where most of your configurations should be done. Unless it is
explicitly specified that a variable should be set before a package is loaded,
you should place your code here.”
(spacemacs/set-leader-keys-for-major-mode ‘haskell-mode
“cx” ‘inferior-haskell-load-and-run
)
)

This helps me compile and execute the Haskell program by using the keystrokes

SPC m c x

Spacemacs1

Getting introduced to Matlab

There was a time when I thought Matlab is a tool used by engineers. Is it even possible for a student of humanities to get access to such tools ? It is and there are academic licenses.

Matrix

matrix = zeros(10,10);

matrix(1,2) = 3
matrix(2,2) = 30
matrix(1,3) = 2
matrix(4,3) = 14
matrix(5,2) = 199
matrix(6,2) = 733

\begin{bmatrix} 0 & 3 & 2 & 0 & 0\\ 0 & 30 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 14 & 0 & 0\\ 0 & 199 & 0 & 0 & 0\\ 0 & 733 & 0 & 0 & 0\\ \end{bmatrix}

Maximum value from a matrix
[max_value, node] = max(matrix(:));

fprintf ('Maximum value is %d and node is %d\n', max_value, node);

node seems to be a reference to the element which is used below to get the row and column using ind2sub.


[i, j] = ind2sub(size(matrix), node);

fprintf ('Row is %d and column is %d\n', i, j);

sub2ind gives the linear indice of the element when we have the row and column of the element.


linearindice = sub2ind(size(matrix), 1, 2);

fprintf ('Linear Indice is %d \n', linearindice);

Sort

I get the sorted matrix and also the matrix of indices of the sorted elements. Very useful.


[values, indices] = sort( matrix );
Sorted values

\begin{bmatrix} 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0\\ 0 & 3 & 0 & 0 & 0\\ 0 & 30 & 0 & 0 & 0\\ 0 & 199 & 2 & 0 & 0\\ 0 & 733 & 14 & 0 & 0\\ \end{bmatrix}

Indices of sorted values

\begin{bmatrix} 1 & 3 & 2 & 1 & 1\\ 2 & 4 & 3 & 2 & 2\\ 3 & 1 & 5 & 3 & 3\\ 4 & 2 & 6 & 4 & 4\\ 5 & 5 & 1 & 5 & 5\\ 6 & 6 & 4 & 6 & 6\\ \end{bmatrix}

Java 8 Optional

I think there are more elegant ways to check if Optional is empty or not but here I have to collect everything in a ArrayList. So I wasn’t able to include isPresent() in the lambda pipeline.

package com.test;

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class OptionalTest {

    private static List<String> newImports = new ArrayList<>();

    public static Optional<List<String>> getOptionalNewImports() {

        return Optional.ofNullable(newImports);
    }

    public static void main( String... argv ){

        newImports.add("org.apache.struts");

        if( getOptionalNewImports().isPresent() ) {
            List<String> imports = new ArrayList<>();
                    getOptionalNewImports().get().stream()
                    .map(p -> "import " + p + ";")
                    .collect(Collectors.toCollection(() -> imports));
            imports.forEach( System.out::println);
        }

    }
}

This is the relevant method.

        public Optional<List<String>> getOptionalNewImports() {

            return Optional.ofNullable(newImports);
        }

This is a proper usage of ifPresent. I assign a value to a variable if the value is present.

                        rules.getOptionalClassIdentifier().ifPresent( a -> {this.classIdentifier = a;});


Notes about Machine Learning fundamentals

I have decided to try a different tack in this post. Gradually as I learn some basic ideas about statistics and Machine Learning I will update this post with code, graphs or procedures used to configure tools. So in a few weeks I will have charted a simple course through the basic Machine Learning terrain. I hope. But these are just basic ideas to prepare oneself to read a more advanced math text.

To be updated …   I will add more details in subsequent posts.

Tools

Anaconda based on Anaconda

GraphLab based on GraphLab

ipython based on ipython

The installation process was tortuous because I work in a corporate environment.

Install GraphLab Create with Command Line

The installation is based on dato’s instructions.

Step 1: Ensure Python 2.7.x

Anaconda with Python 2.x installation didn’t complete in my Windows 7 machine due to some access restriction. It couldn’t set this version of Python as the default.
So I installed Anaconda with Python 3.x. GraphLab works with only Python 2.x

In order to create a Python 2.7 environment the command used is

conda create -n dato-env python=2.7 anaconda

This was blocked by my Virus Scanner and I had to coax our security team to update my policy settings to allow this.

Traceback (most recent call last):
File “D:\Continuum\Anaconda3.4\Scripts\conda-script.py”, line 4, in <module>
sys.exit(main())
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\cli\main.py”, line 202,
in main
args_func(args, p)
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\cli\main.py”, line 207,
in args_func
args.func(args, p)
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\cli\main_create.py”, li
ne 50, in execute
install.install(args, parser, ‘create’)
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\cli\install.py”, line 4
20, in install
plan.execute_actions(actions, index, verbose=not args.quiet)
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\plan.py”, line 502, in
execute_actions
inst.execute_instructions(plan, index, verbose)
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\instructions.py”, line
140, in execute_instructions
cmd(state, arg)
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\instructions.py”, line
55, in EXTRACT_CMD
install.extract(config.pkgs_dirs[0], arg)
File “D:\Continuum\Anaconda3.4\lib\site-packages\conda\install.py”, line 448,
in extract
t.extractall(path=path)
File “D:\Continuum\Anaconda3.4\lib\tarfile.py”, line 1980, in extractall
self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
File “D:\Continuum\Anaconda3.4\lib\tarfile.py”, line 2019, in extract
set_attrs=set_attrs)
File “D:\Continuum\Anaconda3.4\lib\tarfile.py”, line 2088, in _extract_member
self.makefile(tarinfo, targetpath)
File “D:\Continuum\Anaconda3.4\lib\tarfile.py”, line 2128, in makefile
with bltn_open(targetpath, “wb”) as target:
PermissionError: [Errno 13] Permission denied: ‘D:\\Continuum\\Anaconda3.4\\pkgs
\\python-2.7.11-4\\Lib\\pdb.doc’

The last line shown above is what I presume was blocked by the Virus scanner. When the logs were shown to the security team who updated the scanner rules.

Learning some of these topics may be difficult if we don’t read a more advanced book. So I am constrained by the lack of deep knowledge of a related math subject.

But a question like this one must be simple. Right ?

Identify which model performs better when you have the intercept, slope and Residual Sum of squares.

No data point is given.One can plot the lines when their intercepts and slopes are known but I don’t know how that helps.

Plot some lines when we know their intercepts and slopes. Data points are random though and are irrevelant at this time.
from ggplot import *
import pandas as pd
data = {'x': [0, 2, 3, 4, 5, 4, 3.2, 3.3, 2.6, 8.4],
'y': [4.2, 2.6, 1.2, 23, 23, 42, 1.2, 63, 2.3, 2.1],
}
df = pd.DataFrame(data)
g = ggplot(df, aes(x='x', y='y')) + \
geom_point() + \
geom_abline(intercept=0, slope=1.4, colour=&amp;quot;red&amp;quot;) \
+ geom_abline(intercept=3.1, slope=1.4, colour=&amp;quot;blue&amp;quot;) \
+ geom_abline(intercept=2.7, slope=1.9, colour=&amp;quot;green&amp;quot;) \
+ geom_abline(intercept=0, slope=2.3, colour=&amp;quot;black&amp;quot;)
print(g)

 

Q2