Hot questions for Using Neural networks in style transfer

Question:

I'm just getting started with these topics. To the best of my knowledge, style transfer takes the content from one image and the style from another, to generate or recreate the first in the style of the second whereas GAN generates completely new images based on a training set.

But I see a lot of places where the two has been used interchangeably, like this blog here and other places where GAN is used to achieve style transfer, like this paper here

Are GAN and Style transfer two different things or is GAN the method to implement style transfer or are they both different things that does the same thing? Where exactly is the line between the two?


Answer:

GAN is a neural network architecture

style transfer is a (set of) processing method (can be as simple as grayscale or blur)


So the relation is:

  • GAN can be used to implement style transfer. (and other things)

To make it more complicate (hopefully this can make something clear), if you think the feature vector as a style of an image, then feature vector -> image conversion is a style transfer :)

Question:

I've been going through Chollet's Deep Learning with Python, where he briefly covers L2-normalization with regards to Keras. I understand that it prevents overfitting by adding a penalty proportionate to the sum of the square of the weights to the cost function of the layer, helping to keep weights small.

However, in the section covering artistic style transfer, the content loss as a measure is described as:

the L2 norm between the activations of an upper layer in a pretrained convnet, computed over the target image, and the activations of the same layer computed over the generated image. This guarantees that, as seen from the upper layer, the generated image will look similar.

The style loss is also related to the L2-norm, but let's focus on the content loss for now.

So, the relevant code snippet (p.292):

def content_loss(base, combination):
    return K.sum(K.square(combination - base))



outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])
content_layer = 'block5_conv2'
style_layers = ['block1_conv1',
            'block2_conv1',
            'block3_conv1',
            'block4_conv1',
            'block5_conv1']
total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025
#K here refers to the keras backend
loss = K.variable(0.)

layer_features = outputs_dict[content_layer]
target_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(target_image_features,
                         combination_features)

I don't understand why we use the outputs of each layer, which are image feature maps, as opposed to Keras's get_weights() method to fetch the weights to perform normalization. I do not follow how using L2-normalization on these feature maps penalizes during training, or moreover what exactly is it penalizing?


Answer:

I understand that it prevents overfitting by adding a penalty proportionate to the sum of the square of the weights to the cost function of the layer, helping to keep weights small.

What you are referring to is (weight) regularization and in this case, it is L2-regularization. The L2-norm of a vector is the sum of squared of its elements and therefore when you apply L2-regularization on the weights (i.e. parameters) of a layer it would be considered (i.e. added) in the loss function. Since we are minimizing the loss function the side effect is that the L2-norm of the weights will be reduced as well which in turn means that the value of weights has been reduced (i.e. small weights).

However, in the style transfer example the content loss is defined as the L2-norm (or L2-loss in this case) of the difference of between activation (and not weights) of a specific layer (i.e. content_layer) when applied on the target image and the combination image (i.e. target image + style):

return K.sum(K.square(combination - base)) # that's exactly the definition of L2-norm

So no weight regularization is involved here. Rather, the loss function used is the L2-norm and it is used as a measure of similarity of two arrays (i.e. activations of the content layer). The smaller the L2-norm, the more similar the activations.

Why activations of the layer and not its weights? Because we want to make sure that the contents (i.e. representations given by the content_layer) of the target image and the combination image are similar. Note that weights of a layer are fixed and does not change (after training, of course) with respect to an input image; rather, they are used to describe or represent an specific input image, and that representation is called activations of that layer for that specific image.

Question:

Im studying style-transfer networks and right now working with this work and here is network description. The problem that even with adding TV loss there is still visible noise which is breaking quality of result. Can someone recommend some articles of ways of removing such noise during network training?

Thanks


Answer:

The deconvolution noise is because of the uneven overlaps between the input and the kernel which creates a checkerboard-like pattern of varying magnitudes. One fix is to use resize-conv method as mentioned in this article.

Resize-conv replaces transpose convolution with image scaling followed by a 2D convolution. In tensor flow, the 2 steps are: tf.image.resize_images(...) and tf.nn.conv2d(...). Another tip from the authors is to call tf.pad(...) prior to the convolution method and only use Nearest Neighbour resize method.

Question:

I have a set of around 60 fractals (e.g

And a set of 60 snacks (e.g

And I want to apply the style of the fractal on the snack.

Is this possible? Or must I take specifically images from an existing data set with a pre-trained images model?

Thanks


Answer:

It depends whether the method involves training a model on style data or not.

At least one method does not require that at all, instead training a network on a classification task and then infering the style of an image during the style transfer. So you can use a model that has been pre-trained on images that you do not have, and then use it and your images to perform the style transfers.

There is some ready-to use code to do that : example