I ported the Gradient Descent code from Octave to Python. The base Octave code is the one from Andrew Ng’s Machine Learning MOOC.

I mistakenly believed that the Octave code for matrix multiplication will directly translate in Python.

But the Octave code is this

### Octave code

```  theta = theta - ( (  alpha * ( (( theta' * X' )' - y)' * X ))/length(y) )'
```

and the Python code is this.

### Python

```def gradientDescent( X,
y,
theta,
alpha = 0.01,
num_iters = 1500):

r,c = X.shape

for iter in range( 1, num_iters ):
theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )
return theta
```

This line is not a direct transalation.

```        theta = theta - ( ( alpha * np.dot( X.T, ( np.dot( X , theta ).T - np.asarray(y) ).T ) ) / r )
```

But only the above Python code gives me the correct theta that matches the value given by the Octave code. ### Linear Regression But the gradient descent also does not give me the correct value after a certain number of iterations. But the cost value is similar.

### Gradient Descent from Octave Code that converges ### Minimization of cost

Initial cost is 640.125590
J = 656.25
Initial cost is 656.250475
J = 672.58
Initial cost is 672.583001
J = 689.12
Initial cost is 689.123170
J = 705.87
Initial cost is 705.870980
J = 722.83
Initial cost is 722.826433
J = 739.99
Initial cost is 739.989527

### Gradient Descent from my Python Code that does not converge to the optimal value 635.81837438
651.963633303
668.316534159
684.877076945
701.645261664
718.621088313
735.804556895