The problem of non-Convergence of Tensorflow Multivariate Linear regression parameters

when using Tensorflow for multiple linear regression, we encounter the problem of parameter non-convergence. The problem lies in the choice of optimization methods: if you use tf.train.AdamOptimizer , the parameters will converge and the loss function is reasonable, but the weight and bias items are not consistent with the original, which is the first place that you don"t understand; if you use opt = tf.train.GradientDescentOptimizer , the loss function will always increase and you can"t find the reason. If beginners have been unable to find the reason, I hope you have something to understand, you can help explain that the amount of code is not large. Here is the code:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

-sharp 
X1 = np.matrix(np.random.uniform(-10, 10, 100)).T
X2 = np.matrix(np.linspace(-10, 10, 100)).T
X3 = np.matrix(np.linspace(-10, 10, 100)).T
X_input = np.concatenate((X1, X2, X3), axis=1)
-sharp  20,, -35, 4.3 25
Y_input = 20 * X1 - 35 * X2 + 4.3 * X3 + 25 * np.ones((100, 1))

-sharp 
W = tf.Variable(tf.random_uniform(shape=[3, 1]))
b = tf.Variable(tf.random_uniform(shape=[1, 1]))

-sharp 
X = tf.placeholder(dtype=tf.float32, shape=[None, 3])
Y = tf.placeholder(dtype=tf.float32, shape=[None, 1])

-sharp 
Y_pred = tf.matmul(X, W) + b * np.ones((100, 1))

-sharp 
loss = tf.reduce_sum(tf.square(Y_pred - Y)) / 100

-sharp Adma0.01
opt = tf.train.AdamOptimizer(0.01).minimize(loss)
-sharp 
-sharp opt = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

-sharp 
x_axis = []
y_axis = []

with tf.Session() as sess:
    -sharp 
    sess.run(tf.global_variables_initializer())
    print("training,please wait...")
    for i in range(20000):
        sess.run(opt, feed_dict={Y: Y_input, X: X_input})
        x_axis.append(i)
        y_axis.append(sess.run(loss, feed_dict={Y: Y_input, X: X_input}))
    print("finish training!")
    print("W:", sess.run(W), "\nb:", sess.run(b))
    print(sess.run(loss, feed_dict={Y: Y_input, X: X_input}))
    plt.plot(x_axis, y_axis)
    plt.show()