Decision Tree
September 4, 2014 4 Comments
This is a technique used to analyze data for prediction. I came across this when I was studying Machine Learning.
Tree models are computationally intensive techniques for recursively partitioning response variables into subsets based on their relationship to one or more (usually many) predictor variables.
> head(data) file_id time cell_id d1 d2 fsc_small fsc_perp fsc_big pe chl_small 1 203 12 1 25344 27968 34677 14944 32400 2216 28237 2 203 12 4 12960 22144 37275 20440 32400 1795 36755 3 203 12 6 21424 23008 31725 11253 32384 1901 26640 4 203 12 9 7712 14528 28744 10219 32416 1248 35392 5 203 12 11 30368 21440 28861 6101 32400 12989 23421 6 203 12 15 30032 22704 31221 13488 32400 1883 27323 chl_big pop 1 5072 pico 2 14224 ultra 3 0 pico 4 10704 ultra 5 5920 synecho 6 6560 pico
training <- createDataPartition(data$pop, times=1,p=.5,list=FALSE) train <- data[training,] test <- data[,training] fol <- formula(pop ~ fsc_small + fsc_perp + fsc_big + pe + chl_big + chl_small) model <- rpart(fol, method="class", data=train) print(model)
n= 36172 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 36172 25742 pico (0.0014 0.18 0.29 0.25 0.28) 2) pe=41300 5175 660 nano (0 0.87 0 0 0.13) * 11) chl_small=5001.5 9856 783 synecho (0.0052 0.054 0.0051 0.92 0.015) 6) chl_small>=38109.5 653 133 nano (0.078 0.8 0 0.055 0.07) * 7) chl_small< 38109.5 9203 166 synecho (0 0.0015 0.0054 0.98 0.011) *