Principal Component Analysis
September 28, 2015 Leave a comment
This is about what I think I understood about Principal Component Analysis. I will update this blog post later.
The code is in github and it works but I think the eigen values could be wrong. I have to test it further.
These are the two main functions.
"""Compute the covariance matrix for a given dataset.
"""
def estimateCovariance( data ):
print data
mean = getmean( data )
print mean
dataZeroMean = map(lambda x : x - mean, data )
print dataZeroMean
covar = map( lambda x : np.outer(x,x) , dataZeroMean )
print getmean( covar )
return getmean( covar )
"""Computes the top `k` principal components, corresponding scores, and all eigenvalues.
"""
def pca(data, k=2):
d = estimateCovariance( data )
eigVals, eigVecs = eigh(d)
validate( eigVals, eigVecs )
inds = np.argsort(eigVals)[::-1]
topComponent = eigVecs[:,inds[:k]]
print '\nTop Component: \n{0}'.format(topComponent)
correlatedDataScores = map(lambda x : np.dot( x ,topComponent), data )
print ('\nScores : \n{0}'
.format('\n'.join(map(str, correlatedDataScores))))
print '\n eigenvalues: \n{0}'.format(eigVals[inds])
return topComponent,correlatedDataScores,eigVals[inds]