Decision Tree

This is a technique used to analyze data for prediction. I came across this when I was studying Machine Learning.

Tree models are computationally intensive techniques for recursively partitioning response variables into subsets based on their relationship to one or more (usually many) predictor variables.

> head(data)
  file_id time cell_id    d1    d2 fsc_small fsc_perp fsc_big    pe chl_small
1     203   12       1 25344 27968     34677    14944   32400  2216     28237
2     203   12       4 12960 22144     37275    20440   32400  1795     36755
3     203   12       6 21424 23008     31725    11253   32384  1901     26640
4     203   12       9  7712 14528     28744    10219   32416  1248     35392
5     203   12      11 30368 21440     28861     6101   32400 12989     23421
6     203   12      15 30032 22704     31221    13488   32400  1883     27323
  chl_big     pop
1    5072    pico
2   14224   ultra
3       0    pico
4   10704   ultra
5    5920 synecho
6    6560    pico
training <- createDataPartition(data$pop, times=1,p=.5,list=FALSE)
train <- data[training,]
test <- data[,training]

fol <- formula(pop ~ fsc_small + fsc_perp + fsc_big + pe + chl_big + chl_small)
model <- rpart(fol, method="class", data=train)
print(model)
n= 36172

node), split, n, loss, yval, (yprob)
* denotes terminal node

1) root 36172 25742 pico (0.0014 0.18 0.29 0.25 0.28)
2) pe=41300 5175 660 nano (0 0.87 0 0 0.13) *
11) chl_small=5001.5 9856 783 synecho (0.0052 0.054 0.0051 0.92 0.015)
6) chl_small>=38109.5 653 133 nano (0.078 0.8 0 0.055 0.07) *
7) chl_small< 38109.5 9203 166 synecho (0 0.0015 0.0054 0.98 0.011) *