# Principal Component Analysis

September 28, 2015 Leave a comment

This is about what I think I understood about *Principal Component Analysis*. I will update this blog post later.

The code is in *github* and it works but I think the **eigen** values could be wrong. I have to test it further.

These are the two main functions.

"""Compute the covariance matrix for a given dataset. """ def estimateCovariance( data ): print data mean = getmean( data ) print mean dataZeroMean = map(lambda x : x - mean, data ) print dataZeroMean covar = map( lambda x : np.outer(x,x) , dataZeroMean ) print getmean( covar ) return getmean( covar ) """Computes the top `k` principal components, corresponding scores, and all eigenvalues. """ def pca(data, k=2): d = estimateCovariance( data ) eigVals, eigVecs = eigh(d) validate( eigVals, eigVecs ) inds = np.argsort(eigVals)[::-1] topComponent = eigVecs[:,inds[:k]] print '\nTop Component: \n{0}'.format(topComponent) correlatedDataScores = map(lambda x : np.dot( x ,topComponent), data ) print ('\nScores : \n{0}' .format('\n'.join(map(str, correlatedDataScores)))) print '\n eigenvalues: \n{0}'.format(eigVals[inds]) return topComponent,correlatedDataScores,eigVals[inds]