## sigmoid function

### What is a sigmoid function ?

This simple code is the standard way to plot it. I am using Octave.

```x = -10:0.1:10;
a = 1.0 ./ (1.0 + exp(-x));
figure;
plot(x,a,'-','linewidth',3)
``` ## The Caltech-JPL Summer School on Big Data Analytics

This treasure trove of videos teach many Machine Learning subjects. This is not intended to be a typical Coursera course because there are no deadlines or tests.

There is so much to write about what I learn from these videos but for now these measures to assess the costs and benefits of a classification model are intended as reference. ## Frequent Itemsets I am reading Chapter 6 on Frequent Itemsets. I hope to understand the A-priori algorithm.

## Apache Mahout

I followed this tutorial. Mahout seems to be an easy way to test Machine Learning algorithms using the Java API.

But I would use this  this R code instead of the one shown in the tutorial to convert the MovieLens dataset to CSV format.

```r<-file("u.data","r")
w<-file("u1.csv","w")

while( length(data <- readLines(r)) > 0 ){
writeLines(gsub("\\s+",",",data),w)
}
```

## Time series forecast

This code counts how many values from the testing dataset fall within the 95% Confidence Interval range.

```library(forecast)
library(lubridate)  # For year() function below
training = dat[year(dat\$date) < 2012,]
testing = dat[(year(dat\$date)) > 2011,]
tstrain = ts(training\$visitsTumblr)
sum <- 0
fit <- bats(tstrain)
fc <- forecast(fit,h=235)
mat <- fc\$upper
for(i in 1:nrow(mat)){
v <- data.frame(mat[i,])
print(paste(testing\$visitsTumblr[i],v[1,] , v[2,]))
if(testing\$visitsTumblr[i] > v[1,] & testing\$visitsTumblr[i] < v[2,]){
sum <- sum + 1
}
}
print(sum)
```

The forecast object has this type of data(Lo 95 & Hi 95) which I use.

```> fc
Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
366       207.4397 -124.2019 539.0813 -299.7624 714.6418
367       197.2773 -149.6631 544.2177 -333.3223 727.8769
368       235.5405 -112.0582 583.1392 -296.0658 767.1468
369       235.5405 -112.7152 583.7962 -297.0707 768.1516
370       235.5405 -113.3710 584.4520 -298.0736 769.1546
371       235.5405 -114.0256 585.1065 -299.0747 770.1556
372       235.5405 -114.6789 585.7599 -300.0739 771.1548
```

## Lasso fit

### The code I was given

```set.seed(3523)
library(AppliedPredictiveModeling)
data(concrete)
inTrain = createDataPartition(concrete\$CompressiveStrength, p = 3/4)[]
training = concrete[ inTrain,]
testing = concrete[-inTrain,]
```

### This is the data

```<- head(as.matrix(training))
Cement BlastFurnaceSlag FlyAsh Water Superplasticizer CoarseAggregate
47   349.0              0.0      0 192.0              0.0          1047.0
55   139.6            209.4      0 192.0              0.0          1047.0
56   198.6            132.4      0 192.0              0.0           978.4
58   198.6            132.4      0 192.0              0.0           978.4
63   310.0              0.0      0 192.0              0.0           971.0
115  362.6            189.0      0 164.9             11.6           944.7
FineAggregate Age CompressiveStrength
47          806.9   3               15.05
55          806.9   7               14.59
56          825.5   7               14.64
58          825.5   3                9.13
63          850.6   3                9.87
115         755.8   7               22.90
```

### Lasso fit and plot

```predictors <- as.matrix(training)[,-9]
lasso.fit <- lars(predictors,training\$CompressiveStrength,type="lasso",trace=TRUE)
plot(lasso.fit, breaks=FALSE)
``` According to this graph the last coefficient to be set to zero as the penalty increases is Cement. I think this is correct but I may change this.

## RandomForests

I am just posting R code at this time. The explanation is missing but I am making some progress.

```library(ElemStatLearn)
library(randomForest)
data(vowel.train)
data(vowel.test)
```
```> head(vowel.train)
y x.1 x.2 x.3 x.4 x.5 x.6 x.7 x.8 x.9 x.10
1 1 -3.639 0.418 -0.670 1.779 -0.168 1.627 -0.388 0.529 -0.874 -0.814
2 2 -3.327 0.496 -0.694 1.365 -0.265 1.933 -0.363 0.510 -0.621 -0.488
3 3 -2.120 0.894 -1.576 0.147 -0.707 1.559 -0.579 0.676 -0.809 -0.049
4 4 -2.287 1.809 -1.498 1.012 -1.053 1.060 -0.567 0.235 -0.091 -0.795
5 5 -2.598 1.938 -0.846 1.062 -1.633 0.764 0.394 -0.150 0.277 -0.396
6 6 -2.852 1.914 -0.755 0.825 -1.588 0.855 0.217 -0.246 0.238 -0.365```
```vowel.train\$y <- factor(vowel.train\$y)
set.seed(33833)
fit.rf <- randomForest(vowel.train\$y ~ .,data=vowel.train)
plot(fit.rf)
varImpPlot(fit.rf)
``` I was asked to find the order of variable importance which this graph shows. 