What is the difference between Keras and Tensorflow? - Codes Helper

Hello, everyone. I just learned neural network. I want to do a regression experiment to test whether neural networks can learn this special case. This is a simple experiment, training data is randomly generated 10000 pieces of data, each data feature vector dimension of 10x1Magi label is the first element value of the feature vector.

from numpy.random import RandomState
rdm=RandomState(1)
data_size=10000
xdim=10
X=rdm.rand(data_size,xdim)
Y = [x1[0] for x1 in X]

I use a layer-by-layer network to train, and the expected output is Weights=, bias=0.

I wrote tensorflow and keras respectively. It is strange that keras can get the correct results, but the training of tensorflow can not be converged. And the two versions of loss are very different.

Tensorflow version:

import tensorflow as tf
x=tf.placeholder(tf.float64,shape=(None,xdim))
y=tf.placeholder(tf.float64,shape=(None))
Weights = tf.Variable(tf.random_normal([xdim, 1],dtype=tf.float64))
biases = tf.Variable(0.1,dtype=tf.float64)
y_predict = tf.matmul(x, Weights) + biases
loss = tf.reduce_mean(tf.square(y_predict - y))
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

batch_size=100
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(10001):
        start = i * batch_size % data_size
        end = min(start + batch_size,data_size)
        sess.run(optimizer,feed_dict={x:X[start:end],y:Y[start:end]})
        if i % 1000 == 0:
            ypred,training_loss= sess.run([y_predict,loss],feed_dict={x:X,y:Y})
            print("Epoch %d: loss=%g"%(i,training_loss))

output of Tensorflow version:

Epoch 0: loss=1.0679
Epoch 1000: loss=0.11685
Epoch 2000: loss=0.0842979
Epoch 3000: loss=0.0827121
Epoch 4000: loss=0.0824983
Epoch 5000: loss=0.0824296
Epoch 6000: loss=0.0824021
Epoch 7000: loss=0.0823903
Epoch 8000: loss=0.0823851
Epoch 9000: loss=0.0823826
Epoch 10000: loss=0.0823814

Keras version:

from keras.models import Sequential
from keras.layers import Dense
import numpy as np

model = Sequential()
model.add(Dense(units=1, input_dim=xdim)) 
model.compile(loss="mse", optimizer="sgd")

batch_size=100
for i in range(10001):
    start = i * batch_size % data_size
    end = min(start + batch_size,data_size)
    cost = model.train_on_batch(X[start:end], np.array(Y[start:end]))
    if i % 1000 == 0:
        print("Epoch %d: loss=%g"%(i,cost))

output of Keras version:

Epoch 0: loss=0.261707
Epoch 1000: loss=0.00811771
Epoch 2000: loss=0.000325865
Epoch 3000: loss=2.21623e-05
Epoch 4000: loss=4.63907e-06
Epoch 5000: loss=1.66684e-06
Epoch 6000: loss=6.55329e-07
Epoch 7000: loss=2.61024e-07
Epoch 8000: loss=1.04213e-07
Epoch 9000: loss=4.16416e-08
Epoch 10000: loss=1.66369e-08

I think the two pieces of code should be equivalent, but the difference is that I don"t know how to set learning rate inside keras. Why can keras get the right results but not tensorflow? Excuse me, which side did I make a mistake?