Practicing Predictive Analytics using “R”
February 7, 2016 Leave a comment
I spent a Sunday on this code to answer some questions for a Coursera course. At this time this code is the norm in more than one such course. So I am just building muscle memory. I type this code and look at the result and learn what I learnt earlier.
If I don’t remember how to solve it I search but the point is that I have to be constantly in touch with “R” as well the fundamentals. My day job doesn’t let me do this. The other option is a book on Machine Learning like the one by Tom Mitchell but that takes foreover.
setwd("~/Documents/PredictiveAnalytics") library(dplyr) library(ggplot2) library(rpart) library(tree) library(randomForest) library(e1071) library(caret) seaflow <- read.csv(file="seaflow_21min.csv",head=TRUE) final <-filter(seaflow, pop == "synecho") print(nrow(final)) print( summary(seaflow)) print ( nrow(seaflow)) print( head(seaflow)) set.seed(555) trainIndex <- createDataPartition( seaflow$file_id, p = 0.5, list=FALSE, times=1) train <- seaflow[ trainIndex,] test <- seaflow[ -trainIndex,] print(mean(train$time)) p <- ggplot( seaflow, aes( pe, chl_small, color = pop)) + geom_point() dev.new(width=15, height=14) print(p) ggsave("~/predictiveanalytics.png", width=4, height=4, dpi=100) fol <- formula(pop ~ fsc_small + fsc_perp + fsc_big + pe + chl_big + chl_small) model <- rpart(fol, method="class", data=train) print(model) #plot(model) #text(model, use.n = TRUE, all=TRUE, cex=0.9) testprediction <- predict( model, newdata=test, type="class") comparisonofpredictions <- testprediction == test$pop accuracy <- sum(comparisonofpredictions) / length(comparisonofpredictions) print( accuracy ) randomforestmodel <- randomForest( fol, data = train) print(randomforestmodel) testpredictionusingrandomforest <- predict( randomforestmodel, newdata=test, type="class") comparisonofpredictions <- testpredictionusingrandomforest == test$pop accuracy <- sum(comparisonofpredictions) / length(comparisonofpredictions) print( accuracy ) print(importance(randomforestmodel)) svmmodel <- svm( fol, data = train) testpredictionusingsvm <- predict( svmmodel, newdata=test, type="class") comparisonofpredictions <- testpredictionusingsvm == test$pop accuracy <- sum(comparisonofpredictions) / length(comparisonofpredictions) print( accuracy )