Microsoft: Data Science and Machine Learning Essentials →

Gradient Descent

October 6, 2015 Leave a comment

I ported the Gradient Descent code from Octave to Python. The base Octave code is the one from Andrew Ng’s Machine Learning MOOC.

I mistakenly believed that the Octave code for matrix multiplication will directly translate in Python.

The matrices are these.

But the Octave code is this

Octave code

  theta = theta - ( (  alpha * ( (( theta' * X' )' - y)' * X ))/length(y) )'

and the Python code is this.

Python

def gradientDescent( X,
                     y,
                     theta,
                     alpha = 0.01,
                     num_iters = 1500):

    r,c = X.shape
    
    for iter in range( 1, num_iters ):
        theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )
    return theta

This line is not a direct transalation.

        theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )

But only the above Python code gives me the correct theta that matches the value given by the Octave code.

Linear Regression

But the gradient descent also does not give me the correct value after a certain number of iterations. But the cost value is similar.