Hot questions for Using Neural networks in tensorflow2.0


I tried to write a custom implementation of basic neural network with two hidden layers on MNIST dataset using *TensorFlow 2.0 beta* but I'm not sure what went wrong here but my training loss and accuracy seems to stuck at 1.5 and around 85 respectively. But If I build the using Keras I was getting very low training loss and accuracy above 95% with just 8-10 epochs.

I believe that maybe I'm not updating my weights or something? So do I need to assign my new weights which I compute in backprop function backs to their respective weights/bias variables?

I really appreciate if someone could help me out with this and these few more questions that I've mentioned below.

Few more Questions:

1) How to add a Dropout and Batch Normalization layer in this custom implementation? (i.e making it work for both train and test time)

2) How can I use callbacks in this code? i.e (making use of EarlyStopping and ModelCheckpoint callbacks)

3) Is there anything else in my code below that I can optimize further in this code like maybe making use of tensorflow 2.x @tf.function decorator etc.)

4) I would also require to extract the final weights that I obtain for plotting and checking their distributions. To investigate issues like gradient vanishing or exploding. (Eg: Maybe Tensorboard)

5) I also want help in writing this code in a more generalized way so I can easily implement other networks like ConvNets (i.e Conv, MaxPool, etc.) based on this code easily.

Here's my full code for easy reproducibility :

Note: I know I can use high-level API like Keras to build the model much easier but that is not my goal here. Please understand.

import numpy as np
import os
import logging
import tensorflow as tf
import tensorflow_datasets as tfds

(x_train, y_train), (x_test, y_test) = tfds.load('mnist', split=['train', 'test'], 
                                                  batch_size=-1, as_supervised=True)

# reshaping
x_train = tf.reshape(x_train, shape=(x_train.shape[0], 784))
x_test  = tf.reshape(x_test, shape=(x_test.shape[0], 784))

ds_train =, y_train))
# rescaling
ds_train = x, y: (tf.cast(x, tf.float32)/255.0, y))

class Model(object):
    def __init__(self, hidden1_size, hidden2_size, device=None):
        # layer sizes along with input and output
        self.input_size, self.output_size, self.device = 784, 10, device
        self.hidden1_size, self.hidden2_size = hidden1_size, hidden2_size
        self.lr_rate = 1e-03

        # weights initializationg
        self.glorot_init = tf.initializers.glorot_uniform(seed=42)
        # weights b/w input to hidden1 --> 1
        self.w_h1 = tf.Variable(self.glorot_init((self.input_size, self.hidden1_size)))
        # weights b/w hidden1 to hidden2 ---> 2
        self.w_h2 = tf.Variable(self.glorot_init((self.hidden1_size, self.hidden2_size)))
        # weights b/w hidden2 to output ---> 3
        self.w_out = tf.Variable(self.glorot_init((self.hidden2_size, self.output_size)))

        # bias initialization
        self.b1 = tf.Variable(self.glorot_init((self.hidden1_size,)))
        self.b2 = tf.Variable(self.glorot_init((self.hidden2_size,)))
        self.b_out = tf.Variable(self.glorot_init((self.output_size,)))

        self.variables = [self.w_h1, self.b1, self.w_h2, self.b2, self.w_out, self.b_out]

    def feed_forward(self, x):
        if self.device is not None:
            with tf.device('gpu:0' if self.device=='gpu' else 'cpu'):
                # layer1
                self.layer1 = tf.nn.sigmoid(tf.add(tf.matmul(x, self.w_h1), self.b1))
                # layer2
                self.layer2 = tf.nn.sigmoid(tf.add(tf.matmul(self.layer1,
                                                             self.w_h2), self.b2))
                # output layer
                self.output = tf.nn.softmax(tf.add(tf.matmul(self.layer2,
                                                             self.w_out), self.b_out))
        return self.output

    def loss_fn(self, y_pred, y_true):
        self.loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_true, 
        return tf.reduce_mean(self.loss)

    def acc_fn(self, y_pred, y_true):
        y_pred = tf.cast(tf.argmax(y_pred, axis=1), tf.int32)
        y_true = tf.cast(y_true, tf.int32)
        predictions = tf.cast(tf.equal(y_true, y_pred), tf.float32)
        return tf.reduce_mean(predictions)

    def backward_prop(self, batch_xs, batch_ys):
        optimizer = tf.keras.optimizers.Adam(learning_rate=self.lr_rate)
        with tf.GradientTape() as tape:
            predicted = self.feed_forward(batch_xs)
            step_loss = self.loss_fn(predicted, batch_ys)
        grads = tape.gradient(step_loss, self.variables)
        optimizer.apply_gradients(zip(grads, self.variables))

n_shape = x_train.shape[0]
epochs = 20
batch_size = 128

ds_train = ds_train.repeat().shuffle(n_shape).batch(batch_size).prefetch(batch_size)

neural_net = Model(512, 256, 'gpu')

for epoch in range(epochs):
    no_steps = n_shape//batch_size
    avg_loss = 0.
    avg_acc = 0.
    for (batch_xs, batch_ys) in ds_train.take(no_steps):
        preds = neural_net.feed_forward(batch_xs)
        avg_loss += float(neural_net.loss_fn(preds, batch_ys)/no_steps) 
        avg_acc += float(neural_net.acc_fn(preds, batch_ys) /no_steps)
        neural_net.backward_prop(batch_xs, batch_ys)
    print(f'Epoch: {epoch}, Training Loss: {avg_loss}, Training ACC: {avg_acc}')

# output for 10 epochs:
Epoch: 0, Training Loss: 1.7005115111824125, Training ACC: 0.7603832868262543
Epoch: 1, Training Loss: 1.6052448933478445, Training ACC: 0.8524806404020637
Epoch: 2, Training Loss: 1.5905528008006513, Training ACC: 0.8664196092868224
Epoch: 3, Training Loss: 1.584107405738905, Training ACC: 0.8727630912326276
Epoch: 4, Training Loss: 1.5792385798413306, Training ACC: 0.8773203844903037
Epoch: 5, Training Loss: 1.5759121985174716, Training ACC: 0.8804754322627559
Epoch: 6, Training Loss: 1.5739163148682564, Training ACC: 0.8826455712551251
Epoch: 7, Training Loss: 1.5722616605926305, Training ACC: 0.8840812018606812
Epoch: 8, Training Loss: 1.569699136307463, Training ACC: 0.8867688354803249
Epoch: 9, Training Loss: 1.5679460542742163, Training ACC: 0.8885049475356936


I wondered where to start with your multiquestion, and I decided to do so with a statement:

Your code definitely should not look like that and is nowhere near current Tensorflow best practices.

Sorry, but debugging it step by step is waste of everyone's time and would not benefit either of us.

Now, moving to the third point:

3) Is there anything else in my code below that I can optimize further in this code like maybe making use of tensorflow 2.x @tf.function decorator etc.)

Yes, you can use tensorflow2.0 functionalities and it seems like you are running away from those (tf.function decorator is of no use here actually, leave it for the time being).

Following new guidelines would alleviate your problems with your 5th point as well, namely:

5) I also want help in writing this code in a more generalized way so I can easily implement other networks like ConvNets (i.e Conv, MaxPool etc.) based on this code easily.

as it's designed specifically for that. After a little introduction I will try to introduce you to those concepts in a few steps:

1. Divide your program into logical parts

Tensorflow did much harm when it comes to code readability; everything in tf1.x was usually crunched in one place, globals followed by function definition followed by another globals or maybe data loading, all in all mess. It's not really developers fault as the system's design encouraged those actions.

Now, in tf2.0 programmer is encouraged to divide his work similarly to the structure one can see in pytorch, chainer and other more user-friendly frameworks.

1.1 Data loading

You were on good path with Tensorflow Datasets but you turned away for no apparent reason.

Here is your code with commentary what's going on:

# You already have objects after load
(x_train, y_train), (x_test, y_test) = tfds.load('mnist', split=['train', 'test'], 
                                                  batch_size=-1, as_supervised=True)

# But you are reshaping them in a strange manner...
x_train = tf.reshape(x_train, shape=(x_train.shape[0], 784))
x_test  = tf.reshape(x_test, shape=(x_test.shape[0], 784))

# And building from slices...
ds_train =, y_train))
# Unreadable rescaling (there are built-ins for that)

You can easily generalize this idea for any dataset, place this in separate module, say

import tensorflow as tf
import tensorflow_datasets as tfds

class ImageDatasetCreator:
    # More portable and readable than dividing by 255
    def _convert_image_dtype(cls, dataset):
            lambda image, label: (
                tf.image.convert_image_dtype(image, tf.float32),

    def __init__(self, name: str, batch: int, cache: bool = True, split=None):
        # Load dataset, every dataset has default train, test split
        dataset = tfds.load(name, as_supervised=True, split=split)
        # Convert to float range
            self.train = ImageDatasetCreator._convert_image_dtype(dataset["train"])
            self.test = ImageDatasetCreator._convert_image_dtype(dataset["test"])
        except KeyError as exception:
            raise ValueError(
                f"Dataset {name} does not have train and test, write your own custom dataset handler."
            ) from exception

        if cache:
            self.train = self.train.cache()  # speed things up considerably
            self.test = self.test.cache()

        self.batch: int = batch

    def get_train(self):
        return self.train.shuffle().batch(self.batch).repeat()

    def get_test(self):
        return self.test.batch(self.batch).repeat()

So now you can load more than mnist using simple command:

from datasets import ImageDatasetCreator

if __name__ == "__main__":
    dataloader = ImageDatasetCreator("mnist", batch=64, cache = True)
    train, test = dataloader.get_train(), dataloader.get_test()

And you could use any name other than mnist you want to load datasets from now on.

Please, stop making everything deep learning related one hand-off scripts, you are a programmer as well.

1.2 Model creation

Since tf2.0 there are two advised ways one can proceed depending on models complexity:

  • tensorflow.keras.models.Sequential - this way was shown by @Stewart_R, no need to reiterate his points. Used for the simplest models (you should use this one with your feedforward).
  • Inheriting tensorflow.keras.Model and writing custom model. This one should be used when you have some kind of logic inside your module or it's more complicated (things like ResNets, multipath networks etc.). All in all more readable and customizable.

Your Model class tried to resemble something like that but it went south again; backprop definitely is not part of the model itself, neither is loss or accuracy, separate them into another module or function, defo not a member!

That said, let's code the network using the second approach (you should place this code in for brevity). Before that, I will code YourDense feedforward layer from scratch by inheriting from tf.keras.Layers (this one might go into module):

import tensorflow as tf

class YourDense(tf.keras.layers.Layer):
    def __init__(self, units):
        # It's Python 3, you don't have to specify super parents explicitly
        self.units = units

    # Use build to create variables, as shape can be inferred from previous layers
    # If you were to create layers in __init__, one would have to provide input_shape
    # (same as it occurs in PyTorch for example)
    def build(self, input_shape):
        # You could use different initializers here as well
        self.kernel = self.add_weight(
            shape=(input_shape[-1], self.units),
        # You could define bias in __init__ as well as it's not input dependent
        self.bias = self.add_weight(shape=(self.units,), initializer="random_normal")
        # Oh, trainable=True is default

    def call(self, inputs):
        # Use overloaded operators instead of tf.add, better readability
        return tf.matmul(inputs, self.kernel) + self.bias

Regarding your

1) How to add a Dropout and Batch Normalization layer in this custom implementation? (i.e making it work for both train and test time)

I suppose you would like to create a custom implementation of those layers. If not, you can just import from tensorflow.keras.layers import Dropout and use it anywhere you want as @Leevo pointed out. Inverted dropout with different behaviour during train and test below:

class CustomDropout(layers.Layer):
    def __init__(self, rate, **kwargs):
        self.rate = rate

    def call(self, inputs, training=None):
        if training:
            # You could simply create binary mask and multiply here
            return tf.nn.dropout(inputs, rate=self.rate)
        # You would need to multiply by dropout rate if you were to do that
        return self.rate * inputs

Layers taken from here and modified to better fit showcasing purpose.

Now you can create your model finally (simple double feedforward):

import tensorflow as tf

from layers import YourDense

class Model(tf.keras.Model):
    def __init__(self):
        # Use Sequential here for readability = tf.keras.Sequential(
            [YourDense(100), tf.keras.layers.ReLU(), YourDense(10)]

    def call(self, inputs):
        # You can use non-parametric layers inside call as well
        flattened = tf.keras.layers.Flatten()(inputs)

Ofc, you should use built-ins as much as possible in general implementations.

This structure is pretty extensible, so generalization to convolutional nets, resnets, senets, whatever should be done via this module. You can read more about it here.

I think it fulfills your 5th point:

5) I also want help in writing this code in a more generalized way so I can easily implement other networks like ConvNets (i.e Conv, MaxPool etc.) based on this code easily.

Last thing, you may have to use in order to build your model's graph., 28, 28, 1))

This would be for MNIST's 28x28x1 input shape, where None stands for batch.

1.3 Training

Once again, training could be done in two separate ways:

  • standard Keras - useful in simple tasks like classification
  • tf.GradientTape - more complicated training schemes, most prominent example would be Generative Adversarial Networks, where two models optimize orthogonal goals playing minmax game

As pointed out by @Leevo once again, if you are to use the second way, you won't be able to simply use callbacks provided by Keras, hence I'd advise to stick with the first option whenever possible.

In theory you could call callback's functions manually like on_batch_begin() and others where needed, but it would be cumbersome and I'm not sure how would this work.

When it comes to the first option, you can use objects directly with fit. Here is it presented inside another module (preferably

def train(
    model: tf.keras.Model,
    path: str,
    epochs: int,
    steps_per_epoch: int,
    steps_per_validation: int,
    stopping_epochs: int,
        # I used logits as output from the last layer, hence this
            # Tensorboard logging
                / pathlib.Path("%Y%m%d-%H%M%S")),
            # Early stopping with best weights preserving

More complicated approach is very similar (almost copy and paste) to PyTorch training loops, so if you are familiar with those, they should not pose much of a problem.

You can find examples throughout tf2.0 docs, e.g. here or here.

2. Other things
2.1 Unanswered questions

4) Is there anything else in the code that I can optimize further in this code? i.e (making use of tensorflow 2.x @tf.function decorator etc.)

Above already transforms the Model into graphs, hence I don't think you would benefit from calling it in this case. And premature optimization is the root of all evil, remember to measure your code before doing this.

You would gain much more with proper caching of data (as described at the beginning of #1.1) and good pipeline rather than those.

5) Also I need a way to extract all my final weights for all layers after training so I can plot them and check their distributions. To check issues like gradient vanishing or exploding.

As pointed out by @Leevo above,

weights = model.get_weights()

Would get you the weights. You may transform them into np.array and plot using seaborn, matplotlib, analyze, check or whatever else you want.

2.2 Putting it altogether

All in all, your (or entrypoint or something similar) would consist of this (more or less):

from dataset import ImageDatasetCreator
from model import Model
from train import train

# You could use argparse for things like batch, epochs etc.
if __name__ == "__main__":
    dataloader = ImageDatasetCreator("mnist", batch=64, cache=True)
    train, test = dataloader.get_train(), dataloader.get_test()
    model = Model(), 28, 28, 1))
        model, train, path epochs, test, len(train) // batch, len(test) // batch, ...
    )  # provide necessary arguments appropriately
    # Do whatever you want with those
    weights = model.get_weights()

Oh, remember that above functions are not for copy pasting and should be treated more like a guideline. Hit me up if you have any questions.

3. Questions from comments
3.1 How to initialize custom and built-in layers
3.1.1 TLDR what you are about to read
  • Custom Poisson initalization function, but it takes three arguments
  • tf.keras.initalization API needs two arguments (see last point in their docs), hence one is specified via Python's lambda inside custom layer we have written before
  • Optional bias for the layer is added, which can be turned off with boolean

Why is it so uselessly complicated? To show that in tf2.0 you can finally use Python's functionality, no more graph hassle, if instead of tf.cond etc.

3.1.2 From TLDR to implementation

Keras initializers can be found here and Tensorflow's flavor here.

Please note API inconsistencies (capital letters like classes, small letters with underscore like functions), especially in tf2.0, but that's beside the point.

You can use them by passing a string (as it's done in YourDense above) or during object creation.

To allow for custom initialization in your custom layers, you can simply add additional argument to the constructor (tf.keras.Model class is still Python class and it's __init__ should be used same as Python's).

Before that, I will show you how to create custom initialization:

# Poisson custom initialization because why not.
def my_dumb_init(shape, lam, dtype=None):
    return tf.squeeze(tf.random.poisson(shape, lam, dtype=dtype))

Notice, it's signature takes three arguments, while it should take (shape, dtype) only. Still, one can "fix" this easily while creating his own layer, like the one below (extended YourLinear):

import typing

import tensorflow as tf

class YourDense(tf.keras.layers.Layer):
    # It's still Python, use it as Python, that's the point of tf.2.0
    def register_initialization(cls, initializer):
        # Set defaults if init not provided by user
        if initializer is None:
            # let's make the signature proper for init in tf.keras
            return lambda shape, dtype: my_dumb_init(shape, 1, dtype)
        return initializer

    def __init__(
        units: int,
        bias: bool = True,
        # can be string or callable, some typing info added as well...
        kernel_initializer: typing.Union[str, typing.Callable] = None,
        bias_initializer: typing.Union[str, typing.Callable] = None,
        self.units: int = units
        self.kernel_initializer = YourDense.register_initialization(kernel_initializer)
        if bias:
            self.bias_initializer = YourDense.register_initialization(bias_initializer)
            self.bias_initializer = None

    def build(self, input_shape):
        # Simply pass your init here
        self.kernel = self.add_weight(
            shape=(input_shape[-1], self.units),
        if self.bias_initializer is not None:
            self.bias = self.add_weight(
                shape=(self.units,), initializer=self.bias_initializer
            self.bias = None

    def call(self, inputs):
        weights = tf.matmul(inputs, self.kernel)
        if self.bias is not None:
            return weights + self.bias

I have added my_dumb_initialization as the default (if user does not provide one) and made the bias optional with bias argument. Note you can use if freely as long as it's not data dependent. If it is (or is dependent on tf.Tensor somehow), one has to use @tf.function decorator which changes Python's flow to it's tensorflow counterpart (e.g. if to tf.cond).

See here for more on autograph, it's very easy to follow.

If you want to incorporate above initializer changes into your model, you have to create appropriate object and that's it.

... # Previous of code Model here = tf.keras.Sequential(
        YourDense(100, bias=False, kernel_initializer="lecun_uniform"),
        YourDense(10, bias_initializer=tf.initializers.Ones()),
... # and the same afterwards

With built-in tf.keras.layers.Dense layers, one can do the same (arguments names differ, but idea holds).

3.2 Automatic Differentiation using tf.GradientTape
3.2.1 Intro

Point of tf.GradientTape is to allow users normal Python control flow and gradient calculation of variables with respect to another variable.

Example taken from here but broken into separate pieces:

def f(x, y):
  output = 1.0
  for i in range(y):
    if i > 1 and i < 5:
      output = tf.multiply(output, x)
  return output

Regular python function with for and if flow control statements

def grad(x, y):
  with tf.GradientTape() as t:
    out = f(x, y)
  return t.gradient(out, x)

Using gradient tape you can record all operations on Tensors (and their intermediate states as well) and "play" it backwards (perform automatic backward differentiation using chaing rule).

Every Tensor within tf.GradientTape() context manager is recorded automatically. If some Tensor is out of scope, use watch() method as one can see above.

Finally, gradient of output with respect to x (input is returned).

3.2.2 Connection with deep learning

What was described above is backpropagation algorithm. Gradients w.r.t (with respect to) outputs are calculated for each node in the network (or rather for every layer). Those gradients are then used by various optimizers to make corrections and so it repeats.

Let's continue and assume you have your tf.keras.Model, optimizer instance, and loss function already set up.

One can define a Trainer class which will perform training for us. Please read comments in the code if in doubt:

class Trainer:
    def __init__(self, model, optimizer, loss_function):
        self.model = model
        self.loss_function = loss_function
        self.optimizer = optimizer
        # You could pass custom metrics in constructor
        # and adjust train_step and test_step accordingly
        self.train_loss = tf.keras.metrics.Mean(name="train_loss")
        self.test_loss = tf.keras.metrics.Mean(name="train_loss")

    def train_step(self, x, y):
        # Setup tape
        with tf.GradientTape() as tape:
            # Get current predictions of network
            y_pred = self.model(x)
            # Calculate loss generated by predictions
            loss = self.loss_function(y, y_pred)
        # Get gradients of loss w.r.t. EVERY trainable variable (iterable returned)
        gradients = tape.gradient(loss, self.model.trainable_variables)
        # Change trainable variable values according to gradient by applying optimizer policy
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        # Record loss of current step

    def train(self, dataset):
        # For N epochs iterate over dataset and perform train steps each time
        for x, y in dataset:
            self.train_step(x, y)

    def test_step(self, x, y):
        # Record test loss separately
        self.test_loss(self.loss_function(y, self.model(x)))

    def test(self, dataset):
        # Iterate over whole dataset
        for x, y in dataset:
            self.test_step(x, y)

    def __str__(self):
        # You need Python 3.7 with f-string support
        # Just return metrics
        return f"Loss: {self.train_loss.result()}, Test Loss: {self.test_loss.result()}"

Now, you could use this class in your code really simply like this:


# model, optimizer, loss defined beforehand
trainer = Trainer(model, optimizer, loss)
for _ in range(EPOCHS):
    trainer.train(train_dataset) # Same for training and test datasets
    print(f"Epoch {epoch}: {trainer})")

Print would tell you training and test loss for each epoch. You can mix training and testing any way you want (e.g. 5 epochs for training and 1 testing), you could add different metrics etc.

See here if you want non-OOP oriented approach (IMO less readable, but to each it's own).


In TensorFlow 1.x I had great freedom in choosing how and when to print accuracy/loss scores during training. Fore example, if I wanted to print training loss every 100 epochs, in a tf.Session() I'd write:

if epoch % 100 == 0:
    print(str(epoch) + '. Training Loss: ' + str(loss))

After the release of TF 2.0 (alpha), it seems that the Keras API forces to stick with its standard output. Is there a way to take that flexibility back?


If you don't use the Keras Model methods (.fit, .train_on_batch, ...) and you write your own training loop using eager execution (and optionally wrapping it in a tf.function to convert it in its graph representation) you can control the verbosity as you're used to do in 1.x

training_epochs = 10
step = 0
for epoch in range(training_epochs)
    print("starting ",epoch)
    for features, labels in dataset:
        with tf.GradientTape() as tape:
            loss = compute_loss(model(features),labels)
        gradients = tape.gradients(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        step += 1
        if step % 10 == 0:
            # measure other metrics if needed
            print("loss: ", loss)
    print("Epoch ", epoch, " finished.")


I saw the assert layer between some layers of NN.

model = tf.keras.Sequential()
    model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))

    model.add(layers.Reshape((7, 7, 256)))
    assert model.output_shape == (None, 7, 7, 256) # Note: None is the batch size

what does this assert do? and is this necessary?


Assertion is exactly for what its name says, Assertion! in python we use "assert" command to be sure if a statement is exactly equals to what we expect. look at this simple code:

a = 2
b = 3
assert a + b == 5

This part of code runs without any error because a + b is exactly what we expect, 5. But if you change the code in this way:

assert a + b == 6 # 6 or other any number except 5,It doesn't matter

The code will throw an Assertion Error, because a + b != 6 ... The assertion command in the code which you mentioned, checks if the output of the model is exactly in shape of 7 * 7 * 256, otherwise it throws an error. This is useful to prevent the probable problems in next lines of code because of dimension mismatch, so if you remove it, nothing will happen but you will not be noticed if the dimension is not what you expect.


I wrote a ResNet block with three convolutional layers:

def res_net_block(input_data, filters, kernel_size):
kernel_middle = kernel_size + 2
filters_last_layer = filters * 2

x = Conv2D(filters, kernel_size, activation = 'relu', padding = 'same')(input_data)   #64, 1x1 
x = BatchNormalization()(x)

x = Conv2D(filters, kernel_middle, activation = 'relu', padding = 'same')(x)          #64, 3x3
x = BatchNormalization()(x)

x = Conv2D(filters_last_layer, kernel_size, activation = None, padding = 'same')(x)   #128, 1x1 
x = BatchNormalization()(x)

x = Add()([x, input_data])

x = Activation('relu')(x)
return x

When I add it to my model, I receive this error: ValueError: Operands could not be broadcast together with shapes (54, 54, 128) (54, 54, 64)

Here is my model so far:

inputs = Input(shape = (224, 224, 3))
model = Conv2D(filters = 64, kernel_size = 7, strides = 2, activation = 'relu')(inputs)
model = BatchNormalization()(model)
model = MaxPool2D(pool_size = 3, strides = 2)(model)
for i in range(num_res_net_blocks):
    model = res_net_block(model, 64, 1)

I believe the problem comes from this line in the ResNet block:

x = Add()([x, input_data])

The input data is with different dimensions from the x. But I don't know how to fix this issue. I would really appreciate some help here.


The error is due to adding two tensors with different dimensions - (54, 54, 128) & (54, 54, 64). In order to perform tensor addition, the input dimensions must be the same along all axes. Here's the same note from Keras Add() doc:

Quote: "keras.layers.Add() ... It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape)"

In order to perform residual addition, you need to ensure the two tensors - one along the identity path and one on the residual path, have the same dimensions. As a simple solution to debug the error, in the final Conv2D replace filters_last_layer with filters to get both residual (x) and identity tensor (input_data) to have the same shape (54, 54, 64).

Hope this helps! :)


I'm currently stuyind TensorFlow 2.0 and Keras. I know that the activation functions are used to calculate the output of each layer of a neural network, based on mathematical functions. However, when searching about layers, I can't find synthetic and easy-to-read information for a beginner in deep learning.

There's a keras documentation, but I would like to know synthetically:

  • what are the most common layers used to create a model (Dense, Flatten, MaxPooling2D, Dropout, ...).
  • In which case to use each of them? (Classification, regression, other)
  • what is the appropriate way to use each layer depending on each case?

In advance I apologize for my english


Depending on the problem you want to solve, there are different activation functions and loss functions that you can use.

  1. Regression problem: You want to predict the price of a building. You have N features. Of course, the price of the building is a real number, therefore you need to have mean_squared_error as a loss function and a linear activation for your last node. In this case, you can have a couple of Dense() layers with relu activation, while your last layer is a Dense(1,activation='linear'). In between the Dense() layers, you can add Dropout() so as to mitigate the overfitting effect(if present).
  2. Classification problem: You want to detect whether or not someone is diabetic while taking into account several factors/features. In this case, you can use again stacked Dense() layers but your last layer will be a Dense(1,activation='sigmoid'), since you want to detect whether a patient is or not diabetic. The loss function in this case is 'binary_crossentropy'. In between the Dense() layers, you can add Dropout() so as to mitigate the overfitting effect(if present).
  3. Image processing problems: Here you surely have stacks of [Conv2D(),MaxPool2D(),Dropout()]. MaxPooling2D is an operation which is typical for image processing and also some natural language processing(not going to expand upon here). Sometimes, in convolutional neural network architectures, the Flatten() layer is used. Its purpose is to reduce the dimensionality of the feature maps into 1D vector whose dimension is equal to the total number of elements within the entire feature map depth. For example, if you had a matrix of [28,28], flattening it would reduce it to (1,784), where 784=28*28.

Although the question is quite broad and maybe some people will vote to close it, I tried to provide you a short overview over what you asked. I recommend that your start learning the basics behind neural networks and then delve deeper into using a framework, such as TensorFlow or PyTorch.