Hot questions for Using Neural networks in approximation
I'm new to TensorFlow and Data Science. I made a simple module that should figure out the relationship between input and output numbers. In this case, x and x squared. The code in Python:
import numpy as np import tensorflow as tf # TensorFlow only log error messages. tf.logging.set_verbosity(tf.logging.ERROR) features = np.array([-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype = float) labels = np.array([100, 81, 64, 49, 36, 25, 16, 9, 4, 1, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100], dtype = float) model = tf.keras.Sequential([ tf.keras.layers.Dense(units = 1, input_shape = ) ]) model.compile(loss = "mean_squared_error", optimizer = tf.keras.optimizers.Adam(0.0001)) model.fit(features, labels, epochs = 50000, verbose = False) print(model.predict([4, 11, 20]))
I tried a different number of units, and adding more layers, and even using the
relu activation function, but the results were always wrong.
It works with other relationships like x and 2x. What is the problem here?
You are making two very basic mistakes:
- Your ultra-simple model (a single-layer network with a single unit) hardly qualifies as a neural network at all, let alone a "deep learning" one (as your question is tagged)
- Similarly, your dataset (just 20 samples) is also ultra-small
It is certainly understood that neural networks need to be of some complexity if they are to solve problems even as "simple" as
x*x; and where they really shine is when fed with large training datasets.
The methodology when trying to solve such function approximations is not to just list the (few possible) inputs and then fed to the model, along with the desired outputs; remember, NNs learn through examples, and not through symbolic reasoning. And the more examples the better. What we usually do in similar cases is to generate a large number of examples, which we subsequently feed to the model for training.
Having said that, here is a rather simple demonstration of a 3-layer neural network in Keras for approximating the function
x*x, using as input 10,000 random numbers generated in
import numpy as np import keras from keras.models import Sequential from keras.layers import Dense from keras.optimizers import Adam from keras import regularizers import matplotlib.pyplot as plt model = Sequential() model.add(Dense(8, activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape = (1,))) model.add(Dense(8, activation='relu', kernel_regularizer=regularizers.l2(0.001))) model.add(Dense(1)) model.compile(optimizer=Adam(),loss='mse') # generate 10,000 random numbers in [-50, 50], along with their squares x = np.random.random((10000,1))*100-50 y = x**2 # fit the model, keeping 2,000 samples as validation set hist = model.fit(x,y,validation_split=0.2, epochs= 15000, batch_size=256) # check some predictions: print(model.predict([4, -4, 11, 20, 8, -5])) # result: [[ 16.633354] [ 15.031291] [121.26833 ] [397.78638 ] [ 65.70035 ] [ 27.040245]]
Well, not that bad! Remember that NNs are function approximators: we should expect them neither to exactly reproduce the functional relationship nor to "know" that the results for
-4 should be identical.
Let's generate some new random data in
[-50,50] (remember, for all practical purposes, these are unseen data for the model) and plot them, along with the original ones, to get a more general picture:
plt.figure(figsize=(14,5)) plt.subplot(1,2,1) p = np.random.random((1000,1))*100-50 # new random data in [-50, 50] plt.plot(p,model.predict(p), '.') plt.xlabel('x') plt.ylabel('prediction') plt.title('Predictions on NEW data in [-50,50]') plt.subplot(1,2,2) plt.xlabel('x') plt.ylabel('y') plt.plot(x,y,'.') plt.title('Original data')
Well, it arguably does look like a good approximation indeed...
You could also take a look at this thread for a sine approximation.
The last thing to keep in mind is that, although we did get a decent approximation even with our relatively simple model, what we should not expect is extrapolation, i.e. good performance outside
[-50, 50]; for details, see my answer in Is deep learning bad at fitting simple non linear functions outside training scope?
After spending days failing to use neural network for Q learning, I decided to go back to the basics and do a simple function approximation to see if everything was working correctly and see how some parameters affected the learning process. Here is the code that I came up with
from keras.models import Sequential from keras.layers import Dense import matplotlib.pyplot as plt import random import numpy from sklearn.preprocessing import MinMaxScaler regressor = Sequential() regressor.add(Dense(units=20, activation='sigmoid', kernel_initializer='uniform', input_dim=1)) regressor.add(Dense(units=20, activation='sigmoid', kernel_initializer='uniform')) regressor.add(Dense(units=20, activation='sigmoid', kernel_initializer='uniform')) regressor.add(Dense(units=1)) regressor.compile(loss='mean_squared_error', optimizer='sgd') #regressor = ExtraTreesRegressor() N = 5000 X = numpy.empty((N,)) Y = numpy.empty((N,)) for i in range(N): X[i] = random.uniform(-10, 10) X = numpy.sort(X).reshape(-1, 1) for i in range(N): Y[i] = numpy.sin(X[i]) Y = Y.reshape(-1, 1) X_scaler = MinMaxScaler() Y_scaler = MinMaxScaler() X = X_scaler.fit_transform(X) Y = Y_scaler.fit_transform(Y) regressor.fit(X, Y, epochs=2, verbose=1, batch_size=32) #regressor.fit(X, Y.reshape(5000,)) x = numpy.mgrid[-10:10:100*1j] x = x.reshape(-1, 1) y = numpy.mgrid[-10:10:100*1j] y = y.reshape(-1, 1) x = X_scaler.fit_transform(x) for i in range(len(x)): y[i] = regressor.predict(numpy.array([x[i]])) plt.figure() plt.plot(X_scaler.inverse_transform(x), Y_scaler.inverse_transform(y)) plt.plot(X_scaler.inverse_transform(X), Y_scaler.inverse_transform(Y))
The problem is that all my predictions are around 0 in value. As you can see I used an ExtraTreesRegressor from sklearn (commented lines) to check that the protocol is actually correct. So what is wrong with my neural network ? Why is it not working ?
(The actual problem that I'm trying to solve is to compute the Q function for the mountain car problem using neural network. How is it different from this function approximator ?)
With these changes:
- Activations to
kernel_initializer(i.e. leave the default
- Adam optimizer
- 100 epochs
regressor = Sequential() regressor.add(Dense(units=20, activation='relu', input_dim=1)) regressor.add(Dense(units=20, activation='relu')) regressor.add(Dense(units=20, activation='relu')) regressor.add(Dense(units=1)) regressor.compile(loss='mean_squared_error', optimizer='adam') regressor.fit(X, Y, epochs=100, verbose=1, batch_size=32)
and the rest of your code unchanged, here is the result:
Tinker, again and again...