Hot questions for Using Neural networks in torchvision


I am new to pytorch and had a problem with channels in AlexNet. I am using it for a ‘gta san andreas self driving car’ project, I collected the dataset from a black and white image that has one channel and trying to train AlexNet using the script:

from AlexNetPytorch import*
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import numpy as np
import torch
from IPython.core.debugger import set_trace

AlexNet = AlexNet()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(AlexNet.parameters(), lr=0.001, momentum=0.9)

all_data = np.load('training_data.npy')
inputs= all_data[:,0]
labels= all_data[:,1]
inputs_tensors = torch.stack([torch.Tensor(i) for i in inputs])
labels_tensors = torch.stack([torch.Tensor(i) for i in labels])

data_set =,labels_tensors)
data_loader =, batch_size=3,shuffle=True, num_workers=2)

if __name__ == '__main__':
 for epoch in range(8):
  runing_loss = 0.0
  for i,data in enumerate(data_loader , 0):
     inputs= data[0]
     inputs = torch.FloatTensor(inputs)
     labels= data[1]
     labels = torch.FloatTensor(labels)
     # set_trace()
     inputs = torch.unsqueeze(inputs, 1)
     outputs = AlexNet(inputs)
     loss = criterion(outputs , labels)

     runing_loss +=loss.item()
     if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

I am using AlexNet from the link:

But changed line 18 from :

nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2)

To :

nn.Conv2d(1, 64, kernel_size=11, stride=4, padding=2)

Because I am using only one channel in training images, but I get this error:

 File "", line 44, in <module>
    outputs = AlexNet(inputs)
  File "C:\Users\Mukhtar\Anaconda3\lib\site-packages\torch\nn\modules\", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Mukhtar\Documents\AI_projects\gta\", line 34, in forward
    x = self.features(x)
  File "C:\Users\Mukhtar\Anaconda3\lib\site-packages\torch\nn\modules\", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Mukhtar\Anaconda3\lib\site-packages\torch\nn\modules\", line 91, in forward
    input = module(input)
  File "C:\Users\Mukhtar\Anaconda3\lib\site-packages\torch\nn\modules\", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Mukhtar\Anaconda3\lib\site-packages\torch\nn\modules\", line 142, in forward
  File "C:\Users\Mukhtar\Anaconda3\lib\site-packages\torch\nn\", line 396, in max_pool2d
    ret = torch._C._nn.max_pool2d_with_indices(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (256x1x1). Calculated output size: (256x0x0). Output size is too small at c:\programdata\miniconda3\conda-bld\pytorch-cpu_1532499824793\work\aten\src\thnn\generic/SpatialDilatedMaxPooling.c:67

I don't know what is wrong, is it wrong to change the channel size like this, and if it is wrong can you please lead me to a neural network that work with one channel , as I said I am a newbie in pytorch and I don't want to write the nn myself.


Your error is not related to using gray-scale images instead of RGB. Your error is about the spatial dimensions of the input: while "forwarding" an input image through the net, its size (in feature space) became zero - this is the error you see. You can use this nice guide to see what happens to the output size of each layer (conv/pooling) as a function of kernel size, stride and padding. Alexnet expects its input images to be 224 by 224 pixels - make sure your inputs are of the same size.

Other things you overlooked:

  • You are using Alexnet architecture, but you are initializing it to random weights instead of using pretrained weights (trained on imagenet). To get a trained copy of alexnet you'll need to instantiate the net like this

    AlexNet = alexnet(pretrained=True)
  • Once you decide to use pretrained net, you cannot change its first layer from 3 input channels to three (the trained weight simply won't fit). The easiest fix is to make your input images "colorful" by simply repeating the single channel three times. See repeat() for more info.


I want to train SqueezeNet 1.1 model using MNIST dataset instead of ImageNet dataset. Can i have the same model as torchvision.models.squeezenet? Thanks!


TorchVision provides only ImageNet data pretrained model for the SqueezeNet architecture. However, you can train your own model using MNIST dataset by taking only the model (but not the pre-trained one) from torchvision.models.

In [10]: import torchvision as tv

# get the model architecture only; ignore `pretrained` flag
In [11]: squeezenet11 = tv.models.squeezenet1_1()

In [12]:   
Out[12]: True

Now, you can use this architecture to train a model on MNIST data, which should not take too long.

One modification to keep in mind is to update the number of classes which is 10 for MNIST. Specifically, the 1000 should be changed to 10, and the kernel and stride accordingly.

  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AvgPool2d(kernel_size=13, stride=1, padding=0)

Here's the relevant explanation: finetuning_torchvision_models-squeezenet


I use some code similar to the following - for data augmentation:

    from torchvision import transforms


    augmentation = transforms.Compose([
            transforms.RandomRotation([-30, 30])
        ], p=0.5),

During my testing I want to fix random values to reproduce the same random parameters each time I change the model training settings. How can I do it?

I want to do something similar to np.random.seed(0) so each time I call random function with probability for the first time, it will run with the same rotation angle and probability. In other words, if I do not change the code at all, it must reproduce the same result when I rerun it.

Alternatively I can separate transforms, use p=1, fix the angle min and max to a particular value and use numpy random numbers to generate results, but my question if I can do it keeping the code above unchanged.


In the __getitem__ of your dataset class make a numpy random seed.

def __getitem__(self, index):      
    img = io.imread(self.labels.iloc[index,0])
    target = self.labels.iloc[index,1]

    seed = np.random.randint(2147483647) # make a seed with numpy generator 
    random.seed(seed) # apply this seed to img transforms
    if self.transform is not None:
        img = self.transform(img)

    random.seed(seed) # apply this seed to target transforms
    if self.target_transform is not None:
        target = self.target_transform(target)

    return img, target