Hot questions for Using Neural networks in gekko

Question:

I am learning to use Gekko's brain module for deep learning applications.

I have been setting up a neural network to learn the numpy.cos() function and then produce similar results.

I get a good fit when the bounds on my training are:

x = np.linspace(0,2*np.pi,100)

But the model falls apart when I try to extend the bounds to:

x = np.linspace(0,3*np.pi,100)

What do I need to change in my neural network to increase the flexibility of my model so that it works for other bounds?

This is my code:

from gekko import brain
import numpy as np
import matplotlib.pyplot as plt

#Set up neural network 
b = brain.Brain()
b.input_layer(1)
b.layer(linear=2)
b.layer(tanh=2)
b.layer(linear=2)
b.output_layer(1)

#Train neural network
x = np.linspace(0,2*np.pi,100)
y = np.cos(x)
b.learn(x,y)

#Calculate using trained nueral network
xp = np.linspace(-2*np.pi,4*np.pi,100)
yp = b.think(xp)

#Plot results
plt.figure()
plt.plot(x,y,'bo')
plt.plot(xp,yp[0],'r-')
plt.show()

These are results to 2pi:

These are results to 3pi:


Answer:

I get the following result if i increase the nodes to 5 b.layer(tanh=5)

There are probably multiple answers to this question, tho. Maybe increasing the number of layers or changing the activation function. You can always use different solvers, too. Finding the best network architecture is an optimization problem of its own. Some people have tried to figure it out with genetic algorithms, for example:

https://arxiv.org/pdf/1808.03818.pdf

Question:

In the ANN example of TCLab B of the Dynamic optimization course webpage (https://apmonitor.com/do/index.php/Main/TCLabB), has the bias node for every layer been specified in the script? Please let me know which lines represent the bias nodes. If it is not necessary, please explain the reason. Thank you.

# -------------------------------------
# build neural network
# -------------------------------------

nin = 2  # inputs
n1 = 2   # hidden layer 1 (linear)
n2 = 2   # hidden layer 2 (nonlinear)
n3 = 2   # hidden layer 3 (linear)
nout = 2 # outputs

# Initialize gekko models
train = GEKKO() 
dyn   = GEKKO()
model = [train,dyn]

for m in model:
    # use APOPT solver
    m.options.SOLVER = 1

    # input(s)
    m.inpt = [m.Param() for i in range(nin)]

    # layer 1 (linear)
    m.w1 = m.Array(m.FV, (nout,nin,n1))
    m.l1 = [[m.Intermediate(sum([m.w1[k,j,i]*m.inpt[j] \
            for j in range(nin)])) for i in range(n1)] \
            for k in range(nout)]

    # layer 2 (tanh)
    m.w2 = m.Array(m.FV, (nout,n1,n2))
    m.l2 = [[m.Intermediate(sum([m.tanh(m.w2[k,j,i]*m.l1[k][j]) \
            for j in range(n1)])) for i in range(n2)] \
            for k in range(nout)]

    # layer 3 (linear)
    m.w3 = m.Array(m.FV, (nout,n2,n3))
    m.l3 = [[m.Intermediate(sum([m.w3[k,j,i]*m.l2[k][j] \
            for j in range(n2)])) for i in range(n3)] \
            for k in range(nout)]

    # outputs
    m.outpt = [m.CV() for i in range(nout)]
    m.Equations([m.outpt[k]==sum([m.l3[k][i] for i in range(n3)]) \
                 for k in range(nout)])

    # flatten matrices
    m.w1 = m.w1.flatten()
    m.w2 = m.w2.flatten()
    m.w3 = m.w3.flatten()

Answer:

Here are some reasons why you may consider adding bias nodes:

  • A bias is like an intercept term in linear regression and are useful to adjust the inputs or internal nodes to achieve a better fit.
  • Bias terms are extra parameters that the solver can use to minimize the loss function (objective function).

Some of the reasons that you may not want to add bias nodes:

  • They create additional parameters that can create additional extrapolation problems due to over-parameterization and over-fitting
  • A bias can shift the inputs or internal nodes up or down to the point that there are vanishing gradients as the solver iterates. This leads to parts of the model that may no longer contribute to differentiating the predictions.
  • Deep learning networks may be able to compensate for the lack of bias terms by adjusting the average output.

It can also help to scale the inputs and outputs to between 0 and 1, especially if zero for the input would then equate to zero for the output. With this transformation, you've scaled the variables so that the bias term is zero and you are trying to model the change from zero with activation functions. This method is used in dynamic modeling where you transform equations into "deviation variable" form where the nominal or steady state values are set to zero. The equations track a deviation from that nominal zero starting point.

Here are a few additional suggestions on this topic with discussion 1 and discussion 2.

You can add bias terms to your Gekko model such as shown in example 7 of the 18 Gekko tutorials. The bias term is w2b. Similarly, you could add it for the problem you mentioned as well, although you may only want to try it for the first (input) layer as w1a and w1b.

for m in model:
    # use APOPT solver
    m.options.SOLVER = 1

    # input(s)
    m.inpt = [m.Param() for i in range(nin)]

    # layer 1 (linear)
    m.w1a = m.Array(m.FV, (nout,nin,n1))
    m.w1b = m.Array(m.FV, (nout,nin,n1))
    m.l1 = [[m.Intermediate(sum([m.w1a[k,j,i]*m.inpt[j] + m.w1b[k,j,i] \
            for j in range(nin)])) for i in range(n1)] \
            for k in range(nout)]

    # layer 2 (tanh)
    m.w2a = m.Array(m.FV, (nout,n1,n2))
    m.w2b = m.Array(m.FV, (nout,n1,n2))
    m.l2 = [[m.Intermediate(sum([m.tanh(m.w2a[k,j,i]*m.l1[k][j]) + m.w2b[k,j,i] \
            for j in range(n1)])) for i in range(n2)] \
            for k in range(nout)]

    # layer 3 (linear)
    m.w3a = m.Array(m.FV, (nout,n2,n3))
    m.w3b = m.Array(m.FV, (nout,n2,n3))
    m.l3 = [[m.Intermediate(sum([m.w3a[k,j,i]*m.l2[k][j] + m.w3b[k,j,i] \
            for j in range(n2)])) for i in range(n3)] \
            for k in range(nout)]

    # outputs
    m.outpt = [m.CV() for i in range(nout)]
    m.Equations([m.outpt[k]==sum([m.l3[k][i] for i in range(n3)]) \
                 for k in range(nout)])

    # flatten matrices
    m.w1a = m.w1.flatten()
    m.w2a = m.w2.flatten()
    m.w3a = m.w3.flatten()
    m.w1b = m.w1.flatten()
    m.w2b = m.w2.flatten()
    m.w3b = m.w3.flatten()

Question:

I am trying to pass the activation function argument to b.layer() from a list of strings.

I have tried eval('b.layer(parameters[1] = 3)')

'''

   #parameters = [layers,index_activation_function,nodes]

    parameters = [2,2,2]

    #Activation Functions
    a_functions = ['softmax','relu','tanh','sigmoid','linear'] 
    function = a_functions[parameters[1]]

    #NN
    b = brain.Brain(1)
    b.input_layer(1)
    b.layer(linear = 2)
    for i in range(layers):
        eval('b.layer(function=nodes)')
    b.layer(linear = 2)

'''


Answer:

Below is complete example that shows how to construct a string for the eval() function with Gekko. This isn't unique to Gekko and can be used for any string that you want to evaluate as an expression.

from gekko import brain
import numpy as np
import matplotlib.pyplot as plt  

# generate training data
x = np.linspace(0.0,2*np.pi)
y = np.sin(x)

parameters = [2,2,2]
a_functions = ['softmax','relu','tanh','sigmoid','linear'] 
function = a_functions[parameters[1]]
s = 'b.layer('+function+'=2)'

b = brain.Brain()
b.input_layer(1)
b.layer(linear=2)
eval(s)
b.layer(linear=2)
b.output_layer(1)
# train
b.learn(x,y)      

# validate
xp = np.linspace(-2*np.pi,4*np.pi,100)
yp = b.think(xp)  

plt.figure()
plt.plot(x,y,'bo')
plt.plot(xp,yp[0],'r-')
plt.show()

This evaluates the string b.layer(tanh=2) by selecting the tanh activation function from your list of options. Here is the result of that script.