statlearning class

This will be very useful for people like me who want to apply this to Capacity Planning

Rob Tibshirani and I are offering a MOOC in January on Statistical Learning.
This “massive open online course" is free, and is based entirely on our new book
“An Introduction to Statistical Learning with Applications in R”
(James, Witten, Hastie, Tibshirani 2013, Springer). http://www-bcf.usc.edu/~gareth/ISL/
The pdf of the book will also be free.

The course, hosted on Open edX, consists of video lecture segments, quizzes, video R sessions, interviews with famous statisticians,
lecture notes, and more. The course starts on January 22 and runs for 10 weeks.

Please consult the course webpage http://statlearning.class.stanford.edu/ to enroll and for for further details.
----------------------------------------------------------------------------------------
Trevor Hastie hastie@stanford.edu
Professor, Department of Statistics, Stanford University
Phone: (650) 725-2231 Fax: (650) 725-8977
URL: http://www.stanford.edu/~hastie
address: room 104, Department of Statistics, Sequoia Hall
390 Serra Mall, Stanford University, CA 94305-4065
--------------------------------------------------------------------------------------

Statistics of agreement

I found this formula that calculates the percentage of agreement between two ratings quite interesting and coded the following simple steps using ‘R’. This is called Cohen’s kappa and even though there is nothing original about this entry it is very useful. I wrote the simple R code though because I am learning R.
It was also surprising that I didn’t know about it and our teams are not at all technical enough even to use these foundational principles. As is evident this has wide applications in the fields of percentage agreement calculations when two teams don’t agree or auditors don’t agree with each other. Whither will our antagonistic attitude towards good calculations in technical and project management drive us.

The other point that is a highlight is that I found the description of this formula in a paper dealing with Architecture Trade-off Analysis Method.

The matrix created below shows that two people agree with each other on certain points and disagree on others. The formula to calculate the level of agreement is

Observed percentage of agreement - Expected percentage of agreement
--------------------------------------------------------------
1 - Expected percentage of agreement

R code

kappa<-matrix(c(5,2,1,2),ncol=2)
colnames(kappa)<-c("Disagree","Agree")
rownames(kappa)<-c("Disagree","Agree")
kappa

( I have formatted the output of 'R' as a table )



DisagreeAgree
Disagree51
Agree22

kappamargin<-kappa/margin.table(kappa)
kappamargin

( I have formatted the output of 'R' which are the percentages as a table )



DisagreeAgree
Disagree0.50.1
Agree0.20.2

Observed percentage of agreement = 0.5 + 0.2

Now we want the totals as this table shows. We multiply the total figures of the same color



AgreeDisagreeTotal
Agree0.50.10.6
Disagree0.20.20.4
Total0.70.3

So I have just used this line of code to create a matrix of the totals for illustration.

marginals<-matrix(c(margin.table(kappamargin,1),margin.table(kappamargin,2)),ncol=2)
marginals

( I have formatted the output of 'R' as a table )


0.60.7
0.40.3

Expected percentage of agreement = ( 0.6 * 0.7 ) + ( 0.4 * 0.3 )

So final kappa value is

(0.7 - (marginals[1,1] * marginals[1,2]) + (marginals[2,1] * marginals[2,2])) /
(1- (marginals[1,1] * marginals[1,2]) + (marginals[2,1] * marginals[2,2]))

0.57

(i.e)

0.7 - (( 0.6 * 0.7 ) + ( 0.4 * 0.3 ))
----------------------------------
1 - (( 0.6 * 0.7 ) + ( 0.4 * 0.3 ))