Hot questions for Using Neural networks in tensorflow estimator

Top 10 Python Open Source / Neural networks / tensorflow estimator


I'm using tf.estimator in TensorFlow 1.4 and tf.estimator.train_and_evaluate is great but I need early stopping. What's the prefered way of adding that?

I assume there is some tf.train.SessionRunHook somewhere for this. I saw that there was an old contrib package with a ValidationMonitor that seemed to have early stopping, but it doesn't seem to be around anymore in 1.4. Or will the preferred way in the future be to rely on tf.keras (with which early stopping is really easy) instead of tf.estimator/tf.layers/, perhaps?


Good news! tf.estimator now has early stopping support on master and it looks like it will be in 1.10.

estimator = tf.estimator.Estimator(model_fn, model_dir)

os.makedirs(estimator.eval_dir())  # TODO This should not be expected IMO.

early_stopping = tf.contrib.estimator.stop_if_no_decrease_hook(

    train_spec=tf.estimator.TrainSpec(train_input_fn, hooks=[early_stopping]),


I am attempting to use TensorFlow's Estimators. In the documentation the following code is used to train and evaluate the network.

# Fit,, steps=5000)

# Score accuracy
ev = nn.evaluate(,, steps=1)
loss_score = ev["loss"]
print("Loss: %s" % loss_score)

The whole training set is passed in, but we have steps=5000. Does this mean that only the first 5000 examples from the set are considered?

What does the batch_size parameter mean in this context, and how does it interact with steps?



batch_size is the number of examples processed at once. TF pushes all of those through one forward pass (in parallel) and follows with a back-propagation on the same set. This is one iteration, or step.

The steps parameter tells TF to run 5000 of these iterations to train the model.

One epoch is treating each example in the training set exactly once. For instance, if you have one million examples and a batch size of 200, then you need 5000 steps to one epoch: 200 * 5.000 = 1.000.000

Does that clear up the terminology?


(Complete novice at python, machine learning, and TensorFlow)

I am attempting to adapt the TensorFlow Linear Model Tutorial from their offical documentation to the Abalone dataset featured on the ICU machine learning repository. The intent is to guess the rings(age) of an abalone from the other given data.

When running the below program I get the following:

File "/home/lawrence/tensorflow3.5/lib/python3.5/site-packages/tensorflow             /python/ops/", line 220, in lookup
(self._key_dtype, keys.dtype))
TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int32'>.

The error is being thrown in at line 220 and is documented as being thrown when:

      TypeError: when `keys` or `default_value` doesn't match the table data types.

From debugging parse_csv() it seems to be the case that all the tensors are created with the correct type.

Could you please explain what is going wrong? I believe I am following the tutorial code logic and cannot figure this out.

Source Code:

import tensorflow as tf
import shutil

    'sex', 'length', 'diameter', 'height', 'whole_weight',
    'shucked_weight', 'viscera_weight', 'shell_weight', 'rings'

_CSV_COLUMN_DEFAULTS = [['M'], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0]]

    'train': 3000,
    'validation': 1177,

def build_model_columns():
  """Builds a set of wide feature columns."""
   # Continuous columns
  sex = tf.feature_column.categorical_column_with_hash_bucket('sex', hash_bucket_size=1000)
  length = tf.feature_column.numeric_column('length', dtype=tf.float32)
  diameter = tf.feature_column.numeric_column('diameter', dtype=tf.float32)
  height = tf.feature_column.numeric_column('height', dtype=tf.float32)
  whole_weight = tf.feature_column.numeric_column('whole_weight', dtype=tf.float32)
  shucked_weight = tf.feature_column.numeric_column('shucked_weight', dtype=tf.float32)
  viscera_weight = tf.feature_column.numeric_column('viscera_weight', dtype=tf.float32)
  shell_weight = tf.feature_column.numeric_column('shell_weight', dtype=tf.float32)

  base_columns = [sex, length, diameter, height, whole_weight,
                  shucked_weight, viscera_weight, shell_weight]

  return base_columns

def build_estimator():
  """Build an estimator appropriate for the given model type."""
  base_columns = build_model_columns()

  return tf.estimator.LinearClassifier(

 def input_fn(data_file, num_epochs, shuffle, batch_size):
   """Generate an input function for the Estimator."""
   assert tf.gfile.Exists(data_file), (
      '%s not found. Please make sure you have either run or '
      'set both arguments --train_data and --test_data.' % data_file)

  def parse_csv(value):
      print('Parsing', data_file)
      columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS)
      features = dict(zip(_CSV_COLUMNS, columns))
      labels = features.pop('rings')

      return features, labels

  # Extract lines from input files using the Dataset API.
  dataset =

  if shuffle:
    dataset = dataset.shuffle(buffer_size=_NUM_EXAMPLES['train'])

  dataset =

  # We call repeat after shuffling, rather than before, to prevent separate
  # epochs from blending together.
  dataset = dataset.repeat(num_epochs)
  dataset = dataset.batch(batch_size)

  iterator = dataset.make_one_shot_iterator()
  features, labels = iterator.get_next()

  return features, labels

def main(unused_argv):
  # Clean up the model directory if present
  shutil.rmtree("/home/lawrence/models/albones/", ignore_errors=True)
  model = build_estimator()

  # Train and evaluate the model every `FLAGS.epochs_per_eval` epochs.
  for n in range(40 // 2):
    model.train(input_fn=lambda: input_fn(
        "/home/lawrence/", 2, True, 40))

    results = model.evaluate(input_fn=lambda: input_fn(
        "/home/lawrence/", 1, False, 40))

    # Display evaluation metrics
    print('Results at epoch', (n + 1) * 2)
    print('-' * 60)

    for key in sorted(results):
      print('%s: %s' % (key, results[key]))

if __name__ == '__main__':

Here is the classification of the columns of the dataset from abalone.names:

Name            Data Type   Meas.   Description
----            ---------   -----   -----------
Sex             nominal             M, F, [or] I (infant)
Length          continuous  mm      Longest shell measurement
Diameter        continuous  mm      perpendicular to length
Height          continuous  mm      with meat in shell
Whole weight    continuous  grams   whole abalone
Shucked weight  continuous  grams   weight of meat
Viscera weight  continuous  grams   gut weight (after bleeding)
Shell weight    continuous  grams   after being dried
Rings           integer             +1.5 gives the age in years

Dataset entries appear in this order as common separated values with a new line for a new entry.


You've done almost everything right. The problem is with the definition of an estimator.

The task is to predict the Rings column, which is an integer, so it looks like a regression problem. But you've decided to do a classification task, which is also valid:

def build_estimator():
  """Build an estimator appropriate for the given model type."""
  base_columns = build_model_columns()

  return tf.estimator.LinearClassifier(

By default, tf.estimator.LinearClassifier assumes binary classification, i.e., n_classes=2. In your case, it's obviously not true - that's the first bug. You've also set label_vocabulary, which tensorflow interprets as a set of possible values in the label column. That's why it expects tf.string dtype. Since Rings is an integer, you simply don't need label_vocabulary at all.

Combining it all together:

def build_estimator():
  """Build an estimator appropriate for the given model type."""
  base_columns = build_model_columns()

  return tf.estimator.LinearClassifier(

I suggest you also try tf.estimator.LinearRegressor, which will probably be more accurate.


I'm trying to create a simple one-layer/one-unit nn with tensorflow custom estimators that will be able to compute logical AND operation, but I've got a trouble with sigmoid activation -- I want to set threshold

here is my code

x = np.array([
    [0, 0],
    [1, 0],
    [0, 1],
    [1, 1]
], dtype=np.float32)

y = np.array([

def sigmoid(val):
    res = tf.nn.sigmoid(val)
    isGreater = tf.greater(res, tf.constant(0.5))
    return tf.cast(isGreater, dtype=tf.float32)

def model_fn(features, labels, mode, params):
    predictions = tf.layers.dense(inputs=features, units=1, activation=sigmoid)

    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

    loss = tf.losses.sigmoid_cross_entropy(labels, predictions)
    optimizer = tf.train.GradientDescentOptimizer(0.5)
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())

    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

nn = tf.estimator.Estimator(model_fn=model_fn)
input_fn = tf.estimator.inputs.numpy_input_fn(x=x, y=y, shuffle=False, num_epochs=None)

nn.train(input_fn=input_fn, steps=500)

BUT this throws an error

ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'dense/kernel:0' shape=(2, 1) dtype=float32_ref>", "<tf.Variable 'dense/bias:0' shape=(1,) dtype=float32_ref>"] and loss Tensor("sigmoid_cross_entropy_loss/value:0", shape=(), dtype=float32).

How can I fix this? Please help..

Another question I've got - why Tensorflow does not have inbuilt threshold for sigmoid activation? Isn't it one of the most needed things for binary classification (with sigmoid/tanh)?


There is a built-in sigmoid activation, which is tf.nn.sigmoid.

However when you create a network you should never use an activation on the last layer. You need to provide unscaled logits to the layer, like this:

predictions = tf.layers.dense(inputs=features, units=1, activation=None)

loss = tf.losses.sigmoid_cross_entropy(labels, predictions)

Otherwise, with your custom sigmoid your predictions will be either 0 or 1 and there is no gradient available for this.


Im doing a neural network in tensorflow and Im using softmax_cross_entropy to calculate the loss, I'm doing tests and note that it never gives a value of zero, even if I compare the same values, this is my code


with tf.Session() as sess:

I obtain this

[[0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]]

Why is not zero??


Matias's post is correct. The following code gives the same result as your code


with tf.Session() as sess:

    probabilities = tf.nn.softmax(logits=logits)
    # cross entropy
    loss = -tf.reduce_sum(onehot_labels * tf.log(probabilities)) / len(labels)