Dune is a Ocaml build system

Here is my attempt to properly build a toy Ocaml project using Dune.

Since this is the learning phase the Ocaml code may not be idiomatic.

My unit test framework is Alcotest

As is the case with other Ocaml tools and techniques information about this is sketchy. I wish there were more articles and examples as it is fun to work with this language..

I will add more details as I research this further. But for now here is a brief description of the dune build file.

• dune runtest executes  all the tests
• Dune does not install dependencies automatically . So, for example, I have to execute ‘opam install alcotest’. That is how one installs any opam package we need generally.

graph.opam

This looks like the file in which one specifies the framework versions and build instructions.

opam-version: "2.0"
authors: [ "Mohan" ]
synopsis: "Learning Dune"
description: """
Learning Dune
"""
tags: []
depends: [
"ocaml" {
>= "4.02.3"}
"dune"
"alcotest" {with-test}
]
build: [
["dune" "subst"] {pinned}
["dune" "build" "-p" name "-j" jobs]
["dune" "runtest" "-p" name "-j" jobs] {with-test}
]


Dune dependencies

This specifies the dependencies and the modules. My code module is ‘graph‘ and my test module is ‘kruskaltest‘.

•

Test

mirage@mirage:~/theorem$dune runtest kruskaltest alias runtest Testing Weights. This run has ID D7DAB7A8-A60A-4522-9732-54FAE2331A72. [OK] test compare weights of edges 0 Compare weights. The full test results are available in /home/mirage/theorem/_build/default/_build/_tests/D7DAB7A8-A60A-4522-9732-54FAE2331A72. Test Successful in 0.000s. 1 test run. R Reference classes A pure OO approach and a functional representation of it are at loggerheads. That is evident when one tries to adopt an OO approach using a powerful functional language. That is my personal opinion. R has many Object-oriented features built into it. R has three object oriented (OO) systems: [[S3]], [[S4]] and [[R5]]. Reference classes are one such feature. Let us consider this data. The id is that of a Subject who is in a room where monitoring equipment gathers some data. There are several visits to gather this data. id visit room value timepoint 14 0 bedroom 6 53 14 0 bedroom 6 54 15 0 bedroom 2.75 56 The idea that this code is based on is from Martin Fowler’s book Analysis Patterns Reusable Object Models. The chapter on Observations and Measurements has a diagram roughly equivalent to the one shown at the top. The code is lightly tested several times but without unit tests. library(plyr) library(dplyr) library(purrr)  CompoundUnit <- setRefClass("CompoundUnit", fields = list(micrograms = 'numeric', cubicmeter = 'numeric')) Location <- setRefClass("Location", fields = list( room = 'character'), methods=list(getlocation = function(){ room }, summary = function(){ paste('Room [' , room , ']') })) library(objectProperties) # An Enum which could have behaviour associated with it. # This is convoluted but the only way I know to represent constants and validate them. # ############################################################################### MeasurementVisitEnum.gen <- setSingleEnum("MeasurementVisit",levels = c('0', '1', '2')) par.gen <- setRefClass("Visit", properties(fields = list(visit = "MeasurementVisitSingleEnum"), prototype = list(visit = new("MeasurementVisitSingleEnum", '0'))))  What is the significance of this convoluted code ? It restricts the values that are set to 0.1 and 2. It is like the Java enum But this is not strictly a requirement here. It is just that there is a facility to identify erroneous data if we need it. > MeasurementVisitEnum.gen par.gen visits visits$visit visits$visit visits$visit visits$visit <- as.character(3) Error in (function (val) : Attempt to set invalid value on 'visit': value '3' does not belong to level set ( 0, 1, 2 ) TimePoint <- setRefClass("TimePoint", fields = list(time = 'numeric')) Quantity <- setRefClass("Quantity", fields = list(amount = "numeric", units = CompoundUnit))  Measurement encapsulates the quantity, the time point and the visit number. So, for example, during visit 0, at this time point the quantity was observed. This type of encapsulation in the true spirit of OO has its disadvantages as we will see later. Measurement <- setRefClass("Measurement", fields = list( quantity = "Quantity", timepoint = "TimePoint", visit = "Visit"), methods=list(getvisit = function(){ visit$visit
},getquantity = function(){
quantity
})
)

Subject <- setRefClass("Subject",
fields = list( id = "numeric",
measurement = "Measurement",
location = "Location"),
methods=list(getmeasurement = function()
{
measurement
},
getid = function()
{
id
},
getlocation = function()
{
location
},
summary = function()#Implement other summary methods in appropriate objects as per their responsibilities
{
paste("Subject summary ID [",id,"] Location [",location$summary(),"]") },show = function(){ cat("Subject summary ID [",id,"] Location [",location$summary(),"]\n")
})
)


LongitudinalDatum is the class LongitudinalData inherits from. This inheritance is shown as an example. Not all methods that should belong in the super class are properly added. There are many methods in the sub class that can be moved a level up.

subsummary  in the super class can be called from the sub class. The line if( subject(x) == id){ in the sub class LongitudinalData calls this super class method.

LongitudinalDatum  datum

measurements <<- list()

by(df, 1:nrow(df), function(row) {
visits <- par.gen$new() visits$visit <- as.character(row$visit) u <- CompoundUnit$new( micrograms = 1,
cubicmeter = 1 )

q <- Quantity$new(amount = row$value,
units = u )

t <- TimePoint$new(time = row$timepoint)

m <- Measurement$new( quantity = q, timepoint = t, visit = visits) l <- Location$new( room = as.character(row$room)) s <- Subject$new( id = row$id, measurement = m, location = l) measurements <<- c( measurements, s ) }) }, getmeasurementslength = function(){ length(measurements) }, findsubject = function( id ){ result % map(., function(x) { if( subject(x) == id){ result <<- x # Warning message is benign for this example. result #cannot be a class state. It is really local. } } ) result }, visit = function( sub,v ){ measurementsvisit % map(., function(x) { m <- x$getmeasurement()
if (m$getvisit() == v && x$getid() == sub$getid() ){ measurementsvisit <<- c(measurementsvisit,x) } } ) list(visit = measurementsvisit ) } }, room = function( t, room ){ if( length( t) == 0 ){ c('NA') }else{ measurementsvisitroom % map(., function(x) { if( x$getlocation()$getlocation() == room ) measurementsvisitroom <% map(., function(y) { if (x$getid() == y$getid() ){ m <<- x$getmeasurement()
summaries <% summary
}
},subjectsummary = function( subject ){
filteredmeasurements <-
keep(measurements, function(x){
x$getid() == subject$getid()
})
groupedmeasurements % lapply(function(x){
m <% rbind_all()
dataColumns <- c('amount')

ddply(groupedmeasurements,c('visit','location'),function(x)
colSums(x[dataColumns]))
}

)
)


How does this work ?

The data is loaded into an object hierarchy in the load function. I did observe that it was slow most probably because my Eclipse StatET for R setup needs more memory.

Since the methods are all encapsulated by the class I am using the reference to call methods. The result of findsubject is passed to subjectsummary because I am piping the result of one method to the next.

ld <- LongitudinalData$new() out % ld$subjectsummary()
print(out)


So here the result of findsubject(14) is passed as the first parameter when visit(0) is called. 0 becomes the second parameter.

out % ld$visit(0) %>% ld$room("bedroom")


The final result from this pipeline is whatever is returned by the last method room("bedroom").

I would like to reassert that this is just one way of combining multiple methods using Reference classes. There are much more powerful functional approaches that don’t require this many lines of code. This example illustrates a particular Object-oriented approach.

Flattening the Reference classes

The OO hierarchy here does not seem to be malleable when used with some R packagea like dplyr. Try as I may, I cannot coerce the Reference classes into a R data frame and pipe it through stages using dplyr. Remember I want to use functions like map and filter to get the data out of these reference clasees in a shape that I want.

So I abandon my OO approach and flatten the objects and create a data frame. Now I get back the data in the shape I want.

groupedmeasurements %
lapply(
function(x){
m <% rbind_all()


This is how one gets the following output.

out % ld\$subjectsummary()
print(out)

visit location amount
0 bedroom 12.00
0 dining room 2.75
0 living room 2.75
0 room 5.50
0 tv room 2.75
1 room 2.75

Conclusion

This exercise has not helped me determine in which context R’s Reference classes are specifically used. The other OO systems like S3 and S4 may be more useful but this article is about RC’s. Why should I flatten my object hierarchy to reshape my data in a convenient way ? There may be specialized R packages that use the OO approach and expose API’s but I am not aware of them. So at this time I understand that there is a dichotomy between RC’s and the powerful functional approach. I personally like to use the functional programming paradigm when dealing with data.

Joy of OCaml

I have spent most of last week with my Emacs editor and the OCaml development environment. Since I have some OCaml code to complete I will add more details soon.

Suffice it to say that this setup taxed me so much. OPAM does not seem to install easily in Windows. As is my wont in such cases I started with Cygwin and after two days switched to a Ubuntu VM. I didn’t think I was gaining much by reporting Cygwin permission issues to owners of OPAM Windows installers.

Emacs company mode for autocompletion

The toolchain includes company as well as Merlin
and Tuareg.

Emacs elisp

It looks like this at this time and I use Gist because WordPress does not support Lisp or OCaml or Haskell yet. Filed a support ticket.

More about OCaml code later. This creates an associative list of tuples containing characters and the number of times they occur in a String. MultiSet is a module that is not shown either but as I mentioned I have more to write about this wonderful programming language.

Polyglot programming using Jenkins

Facility for languages develops when one does not squander existing opportunities to code. That is what I think.

Jenkins, the CI enabler supports a few languages like Python and Groovy. The Python package I used to make the Rest API calls is ‘Python Jenkins’.It is interesting to note that run_script executes Groovy code.

I didn’t test it exactly when the Unix server runs out of disk space but assumed the text from the console output will match.Moreover the encryption routine works as expected but the decryption function doesn’t work. It seems that since I call the Rest API there could be a encryption/decryption key mismatch.


'''
Created on Oct 12, 2016

This python module gets the console output of the latest
build and if the text 'No space left on device' is found in
the output it sends a mail.
I've taken liberties with the 'functional paradigm'

'''
import smtplib
import jenkins
import os
def main():

overrideenvironmentvariables()

notifydisaster(server)

'''
Notify
'''
def notifydisaster( server ):
print( getconsoleoutput(server) )
name,buildnumber,consoleoutput = getconsoleoutput(server)
if (consoleoutput.find("Caused by: java.io.IOException: No space left on device") != -1):
print("Caused by: java.io.IOException: No space left on device")
sendmail( name,buildnumber )

'''
Notify
Password Encryption/decryption code has to be tested and used
'''
def sendmail(name,buildnumber):
smtp = smtplib.SMTP('smtp.gmail.com', 587)
smtp.ehlo()
smtp.starttls()
smtp.sendmail('x.y@z.com', 'x.y@z.com', 'Subject: No space left on device\n \
Job ' + name + ' Build ' + str(buildnumber) + ' fails due to lack of disk space')

'''
Get the console output of the particular
Job's build
'''
def getconsoleoutput(server):
information = getJobName(server)
if information:
return information[ 0 ]['name'] ,getlastjobDetails(server),server.get_build_console_output(information[ 0 ]['name'], getlastjobDetails(server))

'''
Get Job and other details
and filter the Job we are interested in
'''
def getJobName(server):
jobs = server.get_all_jobs(0)
filtercriterion = ['CITestPipeline']

return list(filter( lambda d: d['fullname'] in filtercriterion, jobs))

'''
Get Job and other details
Return '0' as the build number assuming
it signifies that there is no such build number
'''

def getlastjobDetails(server):
information = getJobName(server)
if information:
last_build_number = server.get_job_info(information[ 0 ]['name'])['lastCompletedBuild']['number']
return last_build_number
else:
return 0

'''
Attempt here to encrypt Passwords using Jenkins' key
Not tested properly
'''
def encrypt(server ):
value = server.run_script("""
println secret.getEncryptedValue()
println secret.getPlainText()
""")
print (value)

def decrypt(server ):
decryptedvalue = server.run_script("""
secret = hudson.util.Secret.fromString("aiJREkuBjWHX9UWIyhEzwnnAJReuZnQVEtUr0KgvXKg")
println hudson.util.Secret.toString(secret)
""")
print (decryptedvalue)
return decryptedvalue
'''
Override this proxy setting as we don't
need it and it causes an error.
'''
def overrideenvironmentvariables():
os.environ["HTTP_PROXY"] = ''

if __name__=="__main__":
main()



Spacemacs

I will update this post soon as my day job leaves little time for fun aspects like this.

Spacemacs’ new Haskell layer is what I like now eventhough the Haskell editor setup is not easy for the novice.

After installing Spacemacs these are the basic steps I followed.

;; —————————————————————-
;; Example of useful layers you may want to use right away.
;; Uncomment some layer names and press (Vim style) or
;; (Emacs style) to install them.
;; —————————————————————-
;; auto-completion
;; better-defaults
emacs-lisp
;; git
;; markdown
;; org
;; (shell :variables
;; shell-default-height 30
;; shell-default-position ‘bottom)
;; spell-checking
;; syntax-checking
;; version-control

(defun dotspacemacs/user-init ()
“Initialization function for user code.
It is called immediately after dotspacemacs/init’, before layer configuration
executes.
This function is mostly useful for variables that need to be set
before packages are loaded. If you are unsure, you should try in setting them in
dotspacemacs/user-config’ first.”
)

C:/Users/476458/AppData/Roaming/local/bin/ contains other tools installed by Stack.

Stack is a cross-platform program for developing Haskell projects. It is aimed at Haskellers both new and experienced.

Suggested by spacemacs Reddit group

(defun dotspacemacs/user-config ()
“Configuration function for user code.
This function is called at the very end of Spacemacs initialization after
layers configuration.
This is the place where most of your configurations should be done. Unless it is
explicitly specified that a variable should be set before a package is loaded,
you should place your code here.”
)
)

This helps me compile and execute the Haskell program by using the keystrokes

SPC m c x

Getting introduced to Matlab

There was a time when I thought Matlab is a tool used by engineers. Is it even possible for a student of humanities to get access to such tools ? It is and there are academic licenses.

Matrix

matrix = zeros(10,10);

matrix(1,2) = 3
matrix(2,2) = 30
matrix(1,3) = 2
matrix(4,3) = 14
matrix(5,2) = 199
matrix(6,2) = 733


$\begin{bmatrix} 0 & 3 & 2 & 0 & 0\\ 0 & 30 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 14 & 0 & 0\\ 0 & 199 & 0 & 0 & 0\\ 0 & 733 & 0 & 0 & 0\\ \end{bmatrix}$

Maximum value from a matrix
[max_value, node] = max(matrix(:));

fprintf ('Maximum value is %d and node is %d\n', max_value, node);



node seems to be a reference to the element which is used below to get the row and column using ind2sub.


[i, j] = ind2sub(size(matrix), node);

fprintf ('Row is %d and column is %d\n', i, j);



sub2ind gives the linear indice of the element when we have the row and column of the element.


linearindice = sub2ind(size(matrix), 1, 2);

fprintf ('Linear Indice is %d \n', linearindice);


Sort

I get the sorted matrix and also the matrix of indices of the sorted elements. Very useful.


[values, indices] = sort( matrix );

Sorted values

$\begin{bmatrix} 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0\\ 0 & 3 & 0 & 0 & 0\\ 0 & 30 & 0 & 0 & 0\\ 0 & 199 & 2 & 0 & 0\\ 0 & 733 & 14 & 0 & 0\\ \end{bmatrix}$

Indices of sorted values

$\begin{bmatrix} 1 & 3 & 2 & 1 & 1\\ 2 & 4 & 3 & 2 & 2\\ 3 & 1 & 5 & 3 & 3\\ 4 & 2 & 6 & 4 & 4\\ 5 & 5 & 1 & 5 & 5\\ 6 & 6 & 4 & 6 & 6\\ \end{bmatrix}$

Java 8 Optional

I think there are more elegant ways to check if Optional is empty or not but here I have to collect everything in a ArrayList. So I wasn’t able to include isPresent() in the lambda pipeline.

package com.test;

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class OptionalTest {

private static List<String> newImports = new ArrayList<>();

public static Optional<List<String>> getOptionalNewImports() {

return Optional.ofNullable(newImports);
}

public static void main( String... argv ){

if( getOptionalNewImports().isPresent() ) {
List<String> imports = new ArrayList<>();
getOptionalNewImports().get().stream()
.map(p -> "import " + p + ";")
.collect(Collectors.toCollection(() -> imports));
imports.forEach( System.out::println);
}

}
}

This is the relevant method.

        public Optional<List<String>> getOptionalNewImports() {

return Optional.ofNullable(newImports);
}



This is a proper usage of ifPresent. I assign a value to a variable if the value is present.

                        rules.getOptionalClassIdentifier().ifPresent( a -> {this.classIdentifier = a;});