model.get_weights() returning array of NaNs after training due to NaN masking

nan loss during training
keras loss: nan classification
mse nan loss
tensorflow nan
lstm loss: nan
clipnorm vs clipvalue
invalid loss, terminating training
vae loss: nan

I'm trying to train an LSTM to classify sequences of various lengths. I want to get the weights of this model, so I can use them in stateful version of the model. Before training, the weights are normal. Also, the training seems to run successfully, with a gradually decreasing error. However, when I change the mask value from -10 to np.Nan, mod.get_weights() starts returning arrays of NaNs and the validation error drops suddenly to a value close to zero. Why is this occurring?

from keras import models
from keras.layers import Dense, Masking, LSTM
from keras.optimizers import RMSprop
from keras.losses import categorical_crossentropy
from keras.preprocessing.sequence import pad_sequences

import numpy as np
import matplotlib.pyplot as plt


def gen_noise(noise_len, mag):
    return np.random.uniform(size=noise_len) * mag


def gen_sin(t_val, freq):
    return 2 * np.sin(2 * np.pi * t_val * freq)


def train_rnn(x_train, y_train, max_len, mask, number_of_categories):
    epochs = 3
    batch_size = 100

    # three hidden layers of 256 each
    vec_dims = 1
    hidden_units = 256
    in_shape = (max_len, vec_dims)

    model = models.Sequential()

    model.add(Masking(mask, name="in_layer", input_shape=in_shape,))
    model.add(LSTM(hidden_units, return_sequences=False))
    model.add(Dense(number_of_categories, input_shape=(number_of_categories,),
              activation='softmax', name='output'))

    model.compile(loss=categorical_crossentropy, optimizer=RMSprop())

    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
              validation_split=0.05)

    return model


def gen_sig_cls_pair(freqs, t_stops, num_examples, noise_magnitude, mask, dt=0.01):
    x = []
    y = []

    num_cat = len(freqs)

    max_t = int(np.max(t_stops) / dt)

    for f_i, f in enumerate(freqs):
        for t_stop in t_stops:
            t_range = np.arange(0, t_stop, dt)
            t_len = t_range.size

            for _ in range(num_examples):
                sig = gen_sin(f, t_range) + gen_noise(t_len, noise_magnitude)
                x.append(sig)

                one_hot = np.zeros(num_cat, dtype=np.bool)
                one_hot[f_i] = 1
                y.append(one_hot)

    pad_kwargs = dict(padding='post', maxlen=max_t, value=mask, dtype=np.float32)
    return pad_sequences(x, **pad_kwargs), np.array(y)


if __name__ == '__main__':
    noise_mag = 0.01
    mask_val = -10
    frequencies = (5, 7, 10)
    signal_lengths = (0.8, 0.9, 1)
    dt_val = 0.01

    x_in, y_in = gen_sig_cls_pair(frequencies, signal_lengths, 50, noise_mag, mask_val)
    mod = train_rnn(x_in[:, :, None], y_in, int(np.max(signal_lengths) / dt_val), mask_val, len(frequencies))

This persists even if I change the network architecture to return_sequences=True and wrap the Dense layer with TimeDistributed, nor does removing the LSTM layer.

I had the same problem. In your case I can see it was probably something different but someone might have the same problem and come here from Google. So in my case I was passing sample_weight parameter to fit() method and when the sample weights contained some zeros in it, get_weights() was returning an array with NaNs. When I omitted the samples where sample_weight=0 (they were useless anyway if sample_weight=0), it started to work.

NaNs when training ReLU's on an input with all zeros · Issue #828 , I'm running into an issue where training a simple model on examples that contain all zeroes returns NaN for the weights and the mask is None: # Instead of calling mean() here, we divide by the sum of if error starts increasing then NaN appears afterwards: diverging due to too high learning rate; if NaNs  [ Natty] keras model.get_weights() returning array of NaNs after training due to NaN masking By: Damian 1.5; [ Natty ] c++ Eigen extracting submatrix from vector of indices By: Alice Schwarze 2.0 ;

The weights are indeed changing. The unchanging weights are from the edge of the image, and they may have not changed because the edge isn't helpful for classifying digits. to check select a specific layer and see the result:

print(model.layers[70].get_weights()[1])

70 : is the number of the last layer in my case.

Keras, The problem is even though I initialize weights as Random normal (or weights = self.model.get_weights() print(weights) return Output: [array([[ 1.97270699e-​02, nan, -1.53264655e-02, nan, nan, Linked pull requests. I was running into my loss function suddenly returning a nan after it go so far into the training process. I checked the relus, the optimizer, the loss function, my dropout in accordance with the relus, the size of my network and the shape of the network. I was still getting loss that eventually turned into a nan and I was getting quite fustrated.

get_weights() method of keras.engine.training.Model instance should retrieve the weights of the model.

This should be a flat list of Numpy arrays, or in other words this should be the list of all weight tensors in the model.

mw = model.get_weights()
print(mw)

If you got the NaN(s) this has a specific meaning. You are dealing simple with vanishing gradients problem. (In some cases even with Exploding gradients).

I would first try to alter the model to reduce the chances for the vanishing gradients. Try reducing the hidden_units first, and normalize your activations.

Even though LSTM are there to solve the problem of vanishing/exploding gradients problem you need to set the right activations from (-1, 1) interval.

Note this interval is where float points are most precise.

Working with np.nan under the masking layer is not a predictable operation since you cannot do comparison with np.nan.

Try print(np.nan==np.nan) and it will return False. This is an old problem with the IEEE 754 standard.

Or it may actually be this is a bug in Tensorflow, based on the IEEE 754 standard weakness.

Missing data: masked arrays, For example, we might want calculate the mean of all values that are not NaN. They can lead to simpler, more concise code. Or, perhaps it might return either an ndarray or a masked array. import numpy as np # Later we will use random number sequences. rng print("xm has", xm.count(), "unmasked values"). Hello! I'm running into an issue where training a simple model on examples that contain all zeroes returns NaN for the weights and the loss. Here is an example: import numpy as np from keras.models import Sequential from keras.layers.cor

[PDF] Package 'keras', It can be passed either as a tf.data Dataset, or as an R array. mobilenet v2 model. mobilenet_v2_decode_predictions() returns a list of data number of epochs with no improvement after which training will be stopped. Other layer methods: get_config(), get_input_at(), get_weights(), reset_states(). Fine-tuning a ResNet model, runs fine while training, predicts correctly after training stops, but after load/save everything turns to NaN. 👍 1 This comment has been minimized.

Pytorch nan weights, In later versions zero is returned. Model Performance Dependent on PyTorch Weights Initialisation. of array elements over a given axis treating Not a Numbers ( NaNs) as zero. 22 Jul 2019 Simply put, when NaN losses are masked out using masked_fill, performing backward on the sum of the losses should linear. For example, NaN == NaN returns logical 0 (false) but NaN ~= NaN returns logical 1 (true). NaN values in a vector are treated as different unique elements. For example, unique([1 1 NaN NaN]) returns the row vector [1 NaN NaN]. Use the isnan or ismissing function to detect NaN values in an array

How to Develop Multi-Step LSTM Time Series Forecasting Models , This data represents a multivariate time series of power-related this in a function named fill_missing() that will take the NumPy array of the from numpy import nan We will use the first three years of data for training predictive models Specifically, we will look at how to develop the following models:. Working with NaNs is always a bit difficult. Maybe it would be useful if you try to enrich NaN values. For example, by averaging the considered feature for groups like an age class. If only a few records have NaN values, you might simply drop these (pandas dropna).

Comments
  • Why would you want to use NaN as mask value? You don't seem to have NaN in your input anywhere
  • @shadi exactly. it's not in my input so I thought it would make a good mask value
  • weights should be initially non zeros, biases may be zeros. However, Keras takes care of that.
  • Why would changing the mask value change the gradient calculated?
  • working with np.nan is unpredictable. Try print(np.nan==np.nan) it means this masking will not work.