Principal Component Analysis
September 18, 2014 Leave a comment
library(caret) library(AppliedPredictiveModeling) set.seed(3433) data(AlzheimerDisease) adData = data.frame(diagnosis,predictors) inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]] training = adData[ inTrain,] testing = adData[-inTrain,]
I recently studied Predictive Analytics techniques as part of a course. I was given the code shown above. I generated the following two predictive models to compare their accuracy figures. This might be easy for experts but I found it tricky. So I post the code here for my reference.
Non-PCA
training1 <- training[,grepl("^IL|^diagnosis",names(training))] test1 <- testing[,grepl("^IL|^diagnosis",names(testing))] modelFit <- train(diagnosis ~ .,method="glm",data=training1) confusionMatrix(test1$diagnosis,predict(modelFit, test1))
Confusion Matrix and Statistics
Reference
Prediction Impaired Control
Impaired 2 20
Control 9 51
Accuracy : 0.6463
95% CI : (0.533, 0.7488)
No Information Rate : 0.8659
P-Value [Acc > NIR] : 1.00000
Kappa : -0.0702
Mcnemar’s Test P-Value : 0.06332
Sensitivity : 0.18182
Specificity : 0.71831
Pos Pred Value : 0.09091
Neg Pred Value : 0.85000
Prevalence : 0.13415
Detection Rate : 0.02439
Detection Prevalence : 0.26829
Balanced Accuracy : 0.45006
‘Positive’ Class : Impaired
PCA
training2 <- training[,grepl("^IL",names(training))] preProc <- preProcess(training2,method="pca",thresh=0.8) test2 <- testing[,grepl("^IL",names(testing))] trainpca <- predict(preProc, training2) testpca <- predict(preProc, test2) modelFitpca <- train(training1$diagnosis ~ .,method="glm",data=trainpca) confusionMatrix(test1$diagnosis,predict(modelFitpca, testpca))
Confusion Matrix and Statistics
Reference
Prediction Impaired Control
Impaired 3 19
Control 4 56Accuracy : 0.7195
95% CI : (0.6094, 0.8132)
No Information Rate : 0.9146
P-Value [Acc > NIR] : 1.000000Kappa : 0.0889
Mcnemar’s Test P-Value : 0.003509Sensitivity : 0.42857
Specificity : 0.74667
Pos Pred Value : 0.13636
Neg Pred Value : 0.93333
Prevalence : 0.08537
Detection Rate : 0.03659
Detection Prevalence : 0.26829
Balanced Accuracy : 0.58762‘Positive’ Class : Impaired