Tensorflow's Estimator stops training

tensorflow examples
tensorflow tutorial
tensorflow github
tensorflow wiki
tensorflow download
tensorflow documentation
tensorflow core
tensorflow online

I am training a model using Tensorflow's Estimator and it suddenly stops training after 2600 steps after performing an evaluation. Isn't it supposed to continue training until the end of the last epoch?

def train():
    train_input_func = lambda: input_fn(mode='train')
    eval_input_func = lambda: input_fn(mode='eval')

    est_conf = tf.estimator.RunConfig(cfg.model_dir, save_checkpoints_secs=120)
    estimator = tf.estimator.Estimator(model_fn, cfg.model_dir, est_conf)


    Path(estimator.eval_dir()).mkdir(parents=True, exist_ok=True)
    train_spec = tf.estimator.TrainSpec(input_fn=train_input_func)
    eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_func, throttle_secs=120)
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

if __name__ == '__main__':
    train()

And this is the input_fn function:

def input_fn(mode=None):
        data_generator = lambda: data_loader.data_generator(mode=mode)

        dataset = tf.data.Dataset.from_generator(data_generator,
                                                 output_types=(tf.int32, tf.int32),
                                                 output_shapes=([None], [None]))

        if mode is 'train':
            dataset.shuffle(cfg.shuffle_buffer).repeat(1000)

        dataset = dataset.padded_batch(cfg.batch_size, padded_shapes=([None],[None])).prefetch(1)

        return dataset

When use tf.estimator.train_and_evaluate, to make max_steps work, you should not use repeat(1000), please use repeat(), it will repeat the input indefinitely, and will not throw OutOfRangeError.

tensorflow/tensorflow: An Open Source Machine Learning , TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research  TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

First You need to specify the max_stps in the TrainSpec definition like the following:

train_spec = tf.estimator.TrainSpec(input_fn=train_input_func, max_steps=num_steps_you_specify)

Second The training procedure will stop when the input_fn throws "OutOfRangeError" on which case the max_step will not work as it was designed to. So in order to make the training run through the whole epochs, you need to specify the input_fn like the folllowing:

dataset = dataset.repeat()# don't specify any number in the repeat()

Hope this will help you.

tensorflow · GitHub, tensorflow has 88 repositories available. Follow their code on GitHub. TensorFlow is Google Brain's second-generation system. Version 1.0.0 was released on February 11, 2017. While the reference implementation runs on single devices, TensorFlow can run on multiple CPUs and GPUs (with optional CUDA and SYCL extensions for general-purpose computing on graphics processing units).

The problem was that I did not assign dataset.shuffle(cfg.shuffle_buffer).repeat(1000). This will fix the problem:

dataset = dataset.shuffle(cfg.shuffle_buffer).repeat(1000)

TensorFlow, Join the TensorFlow team as they kick-off the 2020 TensorFlow Dev Summit. The keynote will feature new product updates for the TensorFlow ecosystem. TensorFlow is an end-to-end open source platform for machine learning.

TensorFlow YouTube channel - TensorFlow, TensorFlow is an end-to-end open source platform for machine learning. TensorFlow is a rich system for managing all aspects of a machine  TensorFlow is an open source deep learning library that is based on the concept of data flow graphs for building models. It allows you to create large-scale neural networks with many layers. Learning the use of this library is also a fundamental part of the AI & Deep Learning course curriculum.

Introduction to TensorFlow, TensorFlow is a way of building and running these neural networks—both at the training stage and the execution stage. It's a set of software  TensorFlow* is a widely-used machine learning framework in the deep learning arena, demanding efficient utilization of computational resources.

Google Just Open Sourced TensorFlow, Its Artificial Intelligence , TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations,  TensorFlow is a Python library for high-performance numerical calculations that allows users to create sophisticated deep learning and machine learning applications. Released as open source software in 2015, TensorFlow has seen tremendous growth and popularity in the data science community.