我想写一个神经网络,它在没有预定义模型的情况下寻找x^2分布。准确地说,它在[-1,1]中给定一些点,并用它们的平方进行训练,然后它必须重现和预测类似的数据,例如[-10,10]。我或多或少地做过——没有数据集。但后来我试图修改它,以便使用数据集并学习如何使用它。现在,我成功地使程序运行,但输出比以前更差,主要是它是常量0。
以前的版本就像[-1,1]中的x^2,线性延长,这更好…以前的输出,蓝线现在是平坦的。目标是与红色的一致…
这里,评论是波兰语,抱歉。
# square2.py - drugie podejscie do trenowania sieci za pomocą Tensorflow
# cel: nauczyć sieć rozpoznawać rozkład x**2
# analiza skryptu z:
# https://stackoverflow.com/questions/43140591/neural-network-to-predict-nth-square
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.python.framework.ops import reset_default_graph
# def. danych do trenowania sieci
# x_train = (np.random.rand(10**3)*4-2).reshape(-1,1)
# y_train = x_train**2
square2_dane = np.load("square2_dane.npz")
x_train = square2_dane['x_tren'].reshape(-1,1)
y_train = square2_dane['y_tren'].reshape(-1,1)
# zoptymalizować dzielenie danych
# x_train = square2_dane['x_tren'].reshape(-1,1)
# ds_x = tf.data.Dataset.from_tensor_slices(x_train)
# batch_x = ds_x.batch(rozm_paczki)
# iterator = ds_x.make_one_shot_iterator()
# określenie parametrów sieci
wymiary = [50,50,50,1]
epoki = 500
rozm_paczki = 200
reset_default_graph()
X = tf.placeholder(tf.float32, shape=[None,1])
Y = tf.placeholder(tf.float32, shape=[None,1])
weights = []
biases = []
n_inputs = 1
# inicjalizacja zmiennych
for i,n_outputs in enumerate(wymiary):
with tf.variable_scope("layer_{}".format(i)):
w = tf.get_variable(name="W", shape=[n_inputs,n_outputs],initializer = tf.random_normal_initializer(mean=0.0,stddev=0.02,seed=42))
b=tf.get_variable(name="b",shape=[n_outputs],initializer=tf.zeros_initializer)
weights.append(w)
biases.append(b)
n_inputs=n_outputs
def forward_pass(X,weights,biases):
h=X
for i in range(len(weights)):
h=tf.add(tf.matmul(h,weights[i]),biases[i])
h=tf.nn.relu(h)
return h
output_layer = forward_pass(X,weights,biases)
f_strat = tf.reduce_mean(tf.squared_difference(output_layer,Y),1)
f_strat = tf.reduce_sum(f_strat)
# alternatywna funkcja straty
#f_strat2 = tf.reduce_sum(tf.abs(Y-y_train)/y_train)
optimizer = tf.train.AdamOptimizer(learning_rate=0.003).minimize(f_strat)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# trenowanie
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
dataset = dataset.batch(rozm_paczki)
dataset = dataset.repeat(epoki)
iterator = dataset.make_one_shot_iterator()
ds_x, ds_y = iterator.get_next()
sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)})
saver = tf.train.Saver()
save = saver.save(sess, "./model.ckpt")
print("Model zapisano jako: %s" % save)
# puszczenie sieci na danych
x_test = np.linspace(-1,1,600)
network_outputs = sess.run(output_layer,feed_dict = {X :x_test.reshape(-1,1)})
plt.plot(x_test,x_test**2,color='r',label='y=x^2')
plt.plot(x_test,network_outputs,color='b',label='sieć NN')
plt.legend(loc='right')
plt.show()
我认为问题在于训练数据的输入sess.run(优化器,{X:sess.run(ds_x),Y:sess.run(ds_y)})
或ds_x的定义,ds_y。这是我的第一个这样的程序…所以这是行的输出(insead的“看到”块)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# trenowanie
for i in range(epoki):
idx = np.arange(len(x_train))
np.random.shuffle(idx)
for j in range(len(x_train)//rozm_paczki):
cur_idx = idx[rozm_paczki*j:(rozm_paczki+1)*j]
sess.run(optimizer,feed_dict = {X:x_train[cur_idx],Y:y_train[cur_idx]})
saver = tf.train.Saver()
save = saver.save(sess, "./model.ckpt")
print("Model zapisano jako: %s" % save)
谢谢!
附言:我受到神经网络的高度启发来预测n平方
有两个问题会让你的模型精度很差,都涉及这条线:
sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)})
>
只有一个训练步骤会执行,因为这段代码不在循环中。您的原始代码运行了len(x_train)//rozm_paczki
步骤,这应该会取得更多进展。
对sess.run(ds_x)
和sess.run(ds_y)
的两次调用在不同的步骤中运行,这意味着它们将包含来自不同批次的不相关的值。每次调用sess.run(ds_x)
或sess.run(ds_y)
都会将迭代器
移动到下一个批次,并丢弃您在sess.run()
调用中没有明确请求的输入元素的任何部分。本质上,您将从批次i获得X
,从批次i 1获得Y
(反之亦然),并且模型将在无效数据上进行训练。如果要从同一个批处理中获取值,则需要在单个sess.run([ds_x,ds_y])
调用中进行。
还有两个问题可能会影响效率:
数据集
没有被打乱。您的原始代码在每个纪元开始时调用np.随机. shuffle()
。您应该在dataset=dataset.shuffle(len(x_train))
之前包含一个dataset=dataset.重复()
。
从迭代器
中获取值返回到Python(例如,当您执行sess.run(ds_x)
时)并将它们反馈到训练步骤中是低效的。将迭代器的输出作为输入直接传递到前馈步骤中会更有效。get_next()
操作。
综上所述,这是你的程序的重写版本,它解决了这四点,并获得了正确的结果。(不幸的是,我的波兰语不够好,无法保留注释,所以我已经翻译成英语。)
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
# Generate training data.
x_train = np.random.rand(10**3, 1).astype(np.float32) * 4 - 2
y_train = x_train ** 2
# Define hyperparameters.
DIMENSIONS = [50,50,50,1]
NUM_EPOCHS = 500
BATCH_SIZE = 200
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
dataset = dataset.shuffle(len(x_train)) # (Point 3.) Shuffle each epoch.
dataset = dataset.repeat(NUM_EPOCHS)
dataset = dataset.batch(BATCH_SIZE)
iterator = dataset.make_one_shot_iterator()
# (Point 2.) Ensure that `X` and `Y` correspond to the same batch of data.
# (Point 4.) Pass the tensors returned from `iterator.get_next()`
# directly as the input of the network.
X, Y = iterator.get_next()
# Initialize variables.
weights = []
biases = []
n_inputs = 1
for i, n_outputs in enumerate(DIMENSIONS):
with tf.variable_scope("layer_{}".format(i)):
w = tf.get_variable(name="W", shape=[n_inputs, n_outputs],
initializer=tf.random_normal_initializer(
mean=0.0, stddev=0.02, seed=42))
b = tf.get_variable(name="b", shape=[n_outputs],
initializer=tf.zeros_initializer)
weights.append(w)
biases.append(b)
n_inputs = n_outputs
def forward_pass(X,weights,biases):
h = X
for i in range(len(weights)):
h=tf.add(tf.matmul(h, weights[i]), biases[i])
h=tf.nn.relu(h)
return h
output_layer = forward_pass(X, weights, biases)
loss = tf.reduce_sum(tf.reduce_mean(
tf.squared_difference(output_layer, Y), 1))
optimizer = tf.train.AdamOptimizer(learning_rate=0.003).minimize(loss)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# (Point 1.) Run the `optimizer` in a loop. Use try-while-except to iterate
# until all elements in `dataset` have been consumed.
try:
while True:
sess.run(optimizer)
except tf.errors.OutOfRangeError:
pass
save = saver.save(sess, "./model.ckpt")
print("Model saved to path: %s" % save)
# Evaluate network.
x_test = np.linspace(-1, 1, 600)
network_outputs = sess.run(output_layer, feed_dict={X: x_test.reshape(-1, 1)})
plt.plot(x_test,x_test**2,color='r',label='y=x^2')
plt.plot(x_test,network_outputs,color='b',label='NN prediction')
plt.legend(loc='right')
plt.show()