Hot questions for Using Neural networks in semantic segmentation

Top 10 Python Open Source / Neural networks / semantic segmentation


Does it make sense to combine cross-entropy loss and dice-score in a weighted fashion for a binary segmentation problem ?

Optimizing the dice-score produces over segmented regions, while cross entropy loss produces under-segmented regions for my application.


I suppose there's no harm in combining the two losses as they are quite "orthogonal" to each other; while cross-entropy treats every pixel as an independent prediction, dice-score looks at the resulting mask in a more "holistic" way. Moreover, considering the fact that these two losses yields significantly different masks, each with its own merits and errors, I suppose combining this complementary information should be beneficial. Make sure you weight the losses such that the gradients from the two losses are roughly in the same scale so you can equally benefit from both.

If you make it work, I'd be interested to hear about your experiments and conclusions ;)


I'm looking for a way that, given an input image and a neural network, it will output a labeled class for each pixel in the image (sky, grass, mountain, person, car etc).

I've set up Caffe (the future-branch) and successfully run the FCN-32s Fully Convolutional Semantic Segmentation on PASCAL-Context model. However, I'm unable to produce clear labeled images with it.

Images that visualizes my problem: Input image ground truth And my result:

This might be some resolution issue. Any idea of where I'm going wrong?


It seems like the 32s model is making large strides and thus working at a coarse resolution. Can you try the 8s model that seems to perform less resolution reduction. Looking at J Long, E Shelhamer, T Darrell Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 (especially at figure 4) it seems like the 32s model is not designed for capturing fine details of the segmentation.


I am trying to implement a paper on Semantic Segmentation and I am confused about how to Upsample the prediction map produced by my segmentation network to match the input image size.

For example, I am using a variant of Resnet101 as the segmentation network (as used by the paper). With this network structure, an input of size 321x321 (again used in the paper) produces a final prediction map of size 41x41xC (C is the number of classes). Because I have to make pixel-level predictions, I need to upsample it to 321x321xC. Pytorch provides function to Upsample to an output size which is a multiple of the prediction map size. So, I can not directly use that method here.

Because this step is involved in every semantic segmentation network, I am sure there should be a standard way to implement this.

I would appreciate any pointers. Thanks in advance.


Maybe the simpliest thing you can try is:

  • upsample 8 times. Then you 41x41 input turns into 328x328
  • perform center cropping to get your desired shape 321x321 (for instance, something like this input[3:,3:,:-4,:-4])


I am trying to use the following CNN architecture for semantic pixel classification. The code I am using is here

However, from my understanding this type of semantic segmentation network typically should have a softmax output layer for producing the classification result.

I could not find softmax used anywhere within the script. Here is the paper I am reading on this segmentation architecture. From Figure 2, I am seeing softmax being used. Hence I would like to find out why this is missing in the script. Any insight is welcome.


You are using quite a complex code to do the training/inference. But if you dig a little you'll see that the loss functions are implemented here and your model is actually trained using cross_entropy loss. Looking at the doc:

This criterion combines log_softmax and nll_loss in a single function.

For numerical stability it is better to "absorb" the softmax into the loss function and not to explicitly compute it by the model. This is quite a common practice having the model outputs "raw" predictions (aka "logits") and then letting the loss (aka criterion) do the softmax internally. If you really need the probabilities you can add a softmax on top when deploying your model.


Im working on the Kaggle semantic segmentation task,

In the testing part of my code,

model = model.eval()
predictions =[]
for data in testdataloader:
    data = t.autograd.Variable(data, volatile=True).cuda()
    output = model.forward(data)
    _,preds = t.max(output, 1, keepdim = True)

when i do the preds part,the array is only filled with ,i was hoping it to be an array of maximum locations im not sure what is going wrong. The output part works well,I have attached a screenshot for visualization of output

Any sugestions on what is going wrong would be really helpful.



Assuming your data is of the form MiniBatch x Dim what you are doing now is looking at which minibatch has the highest value. If you are testing it with a single sample (MB = 1) then you will always get 0 as your answer. Thus, you might want to try:

_,preds = t.max(output, 0, keepdim = False)