Hot questions for Using Neural networks in resnet


I was writing a neural net to train Resnet on CIFAR-10 dataset. The paper Deep Residual Learning For Image Recognition mentions training for around 60,000 epochs.

I was wondering - what exactly does an epoch refer to in this case? Is it a single pass through a minibatch of size 128 (which would mean around 150 passes through the entire 50000 image training set?

Also how long is this expected to take to train(assume CPU only, 20-layer or 32-layer ResNet)? With the above definition of an epoch, it seems it would take a very long time...

I was expecting something around 2-3 hours only, which is equivalent to about 10 passes through the 50000 image training set.


The paper never mentions 60000 epochs. An epoch is generally taken to mean one pass over the full dataset. 60000 epochs would be insane. They use 64000 iterations on CIFAR-10. An iteration involves processing one minibatch, computing and then applying gradients.

You are correct in that this means >150 passes over the dataset (these are the epochs). Modern neural network models often take days or weeks to train. ResNets in particular are troublesome due to their massive size/depth. Note that in the paper they mention training the model on two GPUs which will be much faster than on the CPU.

If you are just training some models "for fun" I would recommend scaling them down significantly. Try 8 layers or so; even this might be too much. If you are doing this for research/production use, get some GPUs.


I want to implement resnet 50 from scratch it is implemented in caffe by author of original paper,but i want tensorflow implementation due to this repository : and therefor this image : I know every equivalent (in tensorflow),but i dont lknow the meaning of scale in place,after batch normalization,can you explain me the meaning and also "use globale state " parameter in batchnorm ?


  1. An "in-place" layer in caffe simply hints caffe to save memory: instead of allocating memory for both input and output of the net, "in-place" layer overrides the input with the output of the layer.
  2. Using global state in "BatchNorm" layer means using the mean/std computed during training and not updating these values any further. This is the "deployment" state of BN layer.


I am working on ResNet and I have found an implementation that does the skip connections with a plus sign. Like the following

Class Net(nn.Module):
    def __init__(self):
        super(Net, self).__int_() 
            self.conv = nn.Conv2d(128,128)

    def forward(self, x):
        out = self.conv(x) // line 1 
        x = out + x    // skip connection  // line 2

Now I have debugged and printed the values before and after line 1. The output was the following:

after line 1 x = [1,128,32,32] out = [1,128,32,32]

After line 2 x = [1,128,32,32] // still

Reference link:

My question is where did it add the value ?? I mean after

x = out + x

operation, where has the value been added ?

PS: Tensor format is [batch, channel, height, width].


As mentioned in comments by @UmangGupta, what you are printing seems to be the shape of your tensors (i.e. the "shape" of a 3x3 matrix is [3, 3]), not their content. In your case, you are dealing with 1x128x32x32 tensors).

Example to hopefully clarify the difference between shape and content :

import torch

out = torch.ones((3, 3))
x = torch.eye(3, 3)
res = out + x

# torch.Size([3, 3])
# tensor([[ 1.,  1.,  1.],
#         [ 1.,  1.,  1.],
#         [ 1.,  1.,  1.]])
# torch.Size([3, 3])
# tensor([[ 1.,  0.,  0.],
#         [ 0.,  1.,  0.],
#         [ 0.,  0.,  1.]])
# torch.Size([3, 3])
# tensor([[ 2.,  1.,  1.],
#         [ 1.,  2.,  1.],
#         [ 1.,  1.,  2.]])