Gradient Descent
October 6, 2015 Leave a comment
I ported the Gradient Descent code from Octave to Python. The base Octave code is the one from Andrew Ng’s Machine Learning MOOC.
I mistakenly believed that the Octave code for matrix multiplication will directly translate in Python.
But the Octave code is this
Octave code
theta = theta - ( ( alpha * ( (( theta' * X' )' - y)' * X ))/length(y) )'
and the Python code is this.
Python
def gradientDescent( X, y, theta, alpha = 0.01, num_iters = 1500): r,c = X.shape for iter in range( 1, num_iters ): theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r ) return theta
This line is not a direct transalation.
theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )
But only the above Python code gives me the correct theta that matches the value given by the Octave code.
Linear Regression
But the gradient descent also does not give me the correct value after a certain number of iterations. But the cost value is similar.
Gradient Descent from Octave Code that converges
Minimization of cost
Initial cost is 640.125590
J = 656.25
Initial cost is 656.250475
J = 672.58
Initial cost is 672.583001
J = 689.12
Initial cost is 689.123170
J = 705.87
Initial cost is 705.870980
J = 722.83
Initial cost is 722.826433
J = 739.99
Initial cost is 739.989527
Gradient Descent from my Python Code that does not converge to the optimal value
Minimization of cost
635.81837438
651.963633303
668.316534159
684.877076945
701.645261664
718.621088313
735.804556895