Hot questions for Using Neural networks in ios

Question:

The Metal Performance Shader framework provides support for building your own Convolutional Neural Nets. When creating for instance an MSPCNNConvolution it requires a 4D weight tensor as init parameter that is represented as a 1D float pointer.

init(device: MTLDevice,
  convolutionDescriptor: MPSCNNConvolutionDescriptor,
  kernelWeights: UnsafePointer<Float>,
  biasTerms: UnsafePointer<Float>?,
  flags: MPSCNNConvolutionFlags)

The documentation has this to say about the 4D tensor

The layout of the filter weight is arranged so that it can be reinterpreted as a 4D tensor (array) weight[outputChannels][kernelHeight][kernelWidth][inputChannels/groups]

Unfortunately that information doesn't really tell me how to arrange a 4D array into a one dimensional Float pointer.

I tried ordering the weights like the BNNS counterpart requires it, but without luck.

How do I properly represent the 4D tensor (array) as a 1D Float pointer (array)?

PS: I tried arranging it like a C array and getting the pointer to the flat array, but it didn't work.

UPDATE

@RhythmicFistman: That's how I stored it in a plain array, which I can convert to a UsafePointer<Float> (but doesn't work):

var output = Array<Float>(repeating: 0, count: weights.count)

for o in 0..<outputChannels {
    for ky in 0..<kernelHeight {
        for kx in 0..<kernelWidth {
            for i in 0..<inputChannels {
                let offset = ((o * kernelHeight + ky) * kernelWidth + kx) * inputChannels + i
                output[offset] = ...
            }
        }
    }
}

Answer:

Ok so I figured it out. Here are the 2 python functions I use to reform my convolutions and fully connected matrices

# shape required for MPSCNN [oC kH kW iC]
# tensorflow order is [kH kW iC oC]
def convshape(a):
    a = np.swapaxes(a, 2, 3)
    a = np.swapaxes(a, 1, 2)
    a = np.swapaxes(a, 0, 1)
    return a

# fully connected only requires a x/y swap
def fullshape(a):
    a = np.swapaxes(a, 0, 1)
    return a

Question:

I want to tune a neural network with dropout using h2o in R. Here I provide a reproducible example for the iris dataset. I'm avoiding to tune eta and epsiplon (i.e. ADADELTA hyper-parameters) with the only purpose of making computations faster.

require(h2o)
h2o.init()
data(iris)
iris = iris[sample(1:nrow(iris)), ]
irisTrain = as.h2o(iris[1:90, ])
irisValid = as.h2o(iris[91:120, ])
irisTest = as.h2o(iris[121:150, ])
hyper_params <- list(
    input_dropout_ratio = list(0, 0.15, 0.3),
    hidden_dropout_ratios = list(0, 0.15, 0.3, c(0,0), c(0.15,0.15),c(0.3,0.3)),
    hidden = list(64, c(32,32)))
grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
                training_frame = irisTrain, validation_frame = irisValid,
                hyper_params = hyper_params, adaptive_rate = TRUE,
                variable_importances = TRUE, epochs = 50, stopping_rounds=5,
                stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
                seed=1, reproducible=TRUE)

The output is:

Details: ERRR on field: _hidden_dropout_ratios: Must have 1 hidden layer dropout ratios.

The problem is in hidden_dropout_ratios. Note that I'm including 0 for input_dropout_ratio and hidden_dropout_ratios since I also want to test the activation function without dropout. I'm aware that I could use activation="Rectifier but I think that my configuration should lead to the same result. How do I tune hidden_dropout_ratios when tuning architectures with different numbers of layers?

Attempt 1: Unsuccessful and I'm not tuning hidden.

hyper_params <- list(
    input_dropout_ratio = c(0, 0.15, 0.3),
    hidden_dropout_ratios = list(c(0.3,0.3), c(0.5,0.5)),
    hidden = c(32,32))
ERRR on field: _hidden_dropout_ratios: Must have 1 hidden layer dropout ratios.

Attempt 2: Successful but I'm not tuning hidden.

hyper_params <- list(
    input_dropout_ratio = c(0, 0.15, 0.3),
    hidden_dropout_ratios = c(0.3,0.3),
    hidden = c(32,32))

Answer:

You have to fix the number of hidden layers in a grid, if experimenting with hidden_dropout_ratios. At first I messed around with combining multiple grids; then, when researching for my H2O book, I saw someone mention, in passing, how grids get combined automatically if you give them the same name.

So, you still need to call h2o.grid() for each number of hidden layers, but they can all be in the same grid at the end. Here is your example modified for that:

require(h2o)
h2o.init()
data(iris)
iris = iris[sample(1:nrow(iris)), ]
irisTrain = as.h2o(iris[1:90, ])
irisValid = as.h2o(iris[91:120, ])
irisTest = as.h2o(iris[121:150, ])

hyper_params1 <- list(
    input_dropout_ratio = c(0, 0.15, 0.3),
    hidden_dropout_ratios = list(0, 0.15, 0.3),
    hidden = list(64)
    )

hyper_params2 <- list(
    input_dropout_ratio = c(0, 0.15, 0.3),
    hidden_dropout_ratios = list(c(0,0), c(0.15,0.15),c(0.3,0.3)),
    hidden = list(c(32,32))
    )

grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
    grid_id = "stackoverflow",
    training_frame = irisTrain, validation_frame = irisValid,
    hyper_params = hyper_params1, adaptive_rate = TRUE,
    variable_importances = TRUE, epochs = 50, stopping_rounds=5,
    stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
    seed=1, reproducible=TRUE)

grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
    grid_id = "stackoverflow",
    training_frame = irisTrain, validation_frame = irisValid,
    hyper_params = hyper_params2, adaptive_rate = TRUE,
    variable_importances = TRUE, epochs = 50, stopping_rounds=5,
    stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
    seed=1, reproducible=TRUE)

When I went to print the grid, I was reminded there is a bug with grid output when using list hyper-parameters, such as hidden or hidden_dropout_ratios. Your code is a nice self-contained example, so I'll report that now. In the meantime, here is a one-liner to show the values of the hyper-parameter corresponding to each:

sapply(models, function(m) c(
  paste(m@parameters$hidden, collapse = ","),
  paste(m@parameters$hidden_dropout_ratios, collapse=",")
  ))

Which gives:

     [,1]    [,2] [,3]        [,4]   [,5]      [,6] 
[1,] "32,32" "64" "32,32"     "64"   "32,32"   "64" 
[2,] "0,0"   "0"  "0.15,0.15" "0.15" "0.3,0.3" "0.3"

I.e. no hidden dropout is better than a little, which is better than a lot. And two hidden layers is better than one.

By the way,

  • input_dropout_ratio: controls dropout between input layer and the first hidden layer. Can be used independently of the activation function.
  • hidden_dropout_ratios: controls dropout between each hidden layer and the next layer (which is either the next hidden layer, or the output layer). If specified, you must specify one of the "WithDropout" activation functions.

Question:

I understant that my question is not directly related to programming itself and looks more like research. But probably someone can advise here.

I have an idea for app, when user takes a photo and app will analyze it and cut everythig except required object (a piece of clothin for example) and will save it in a separate image. Yesterday it was very difficult task, because developer should create pretty good neural network and educate it. But after Apple released iPhone X with true depth camera, half of the problems can be solved. As per my understanding, developer can remove background much more easily, because iPhone will know where background is located.

So only several questions left:

I. What is the format of photos which are taken by iPhone X with true depth camera? Is it possible to create neural network that will be able to use information about depth from the picture?

II. I've read about CoreML, tried some examples, but it's still not clear for me - how the following behaviour can be achieved in terms of External Neural Network that was imported into CoreML:

  1. Neural network gets an image as an input data.

  2. NN analyzes it, finds required object on the image.

  3. NN returns not only determinated type of object, but cropped object itself or array of coordinates/pixels of the area that should be cropped.

  4. Application gets all required information from NN and performs necessary actions to crop an image and save it to another file or whatever.

Any advice will be appreciated.


Answer:

Ok, your question is actually directly related to programming:)

Ad I. The format is HEIF, but you access data of the image (if you develop an iPhone app) by means of iOS APIs, so you easily get information about bitmap as CVPixelBuffer.

Ad II. 1. Neural network gets an image as an input data.

As mentioned above, you want to get your bitmap first, so create a CVPixelBuffer. Check out this post for example. Then you use CoreML API. You want to use MLFeatureProvider protocol. An object which conforms to is where you put your vector data with MLFeatureValue under a key name picked by you (like "pixelData").

import CoreML

class YourImageFeatureProvider: MLFeatureProvider {

    let imageFeatureValue: MLFeatureValue
    var featureNames: Set<String> = []

    init(with imageFeatureValue: MLFeatureValue) {
        featureNames.insert("pixelData")
        self.imageFeatureValue = imageFeatureValue
    }

    func featureValue(for featureName: String) -> MLFeatureValue? {
        guard featureName == "pixelData" else {
            return nil
        }
        return imageFeatureValue
    }
}

Then you use it like this, and feature value will be created with initWithPixelBuffer initializer on MLFeatureValue:

let imageFeatureValue = MLFeatureValue(pixelBuffer: yourPixelBuffer)
let featureProvider = YourImageFeatureProvider(imageFeatureValue: imageFeatureValue)

Remember to crop/scale image before this operation so as to your network is being fed with a vector of a proper size.

  1. NN analyzes it, finds required object on the image.

Use prediction function on your CoreML model.

do {

    let outputFeatureProvider = try yourModel.prediction(from: featureProvider)

    //success! your output feature provider has your data
} catch {

    //your model failed to predict, check the error
}
  1. NN returns not only determinated type of object, but cropped object itself or array of coordinates/pixels of the area that should be cropped.

This depends on your model and whether you imported it correctly. Under the assumption you did, you access output data by checking returned MLFeatureProvider (remember that this is a protocol, so you would have to implement another one similar to what I made for you in step 1, smth like YourOutputFeatureProvider) and there you have a bitmap and rest of the data your NN spits out.

  1. Application gets all required information from NN and performs necessary actions to crop an image and save it to another file or whatever.

Just reverse step 1, so from MLFeatureValue -> CVPixelBuffer -> UIImage. There are plenty of questions on SO about this so I won't repeat answers.

If you are a beginner, don't expect to have results overnight, but the path is here. For an experienced dev I would estimate this work for several hours to get work done (plus model learning time and porting it to CoreML).

Apart from CoreML (maybe you find your model too sophisticated and it won't be able to port it to CoreML) check out Matthjis Hollemans' github (very good resources on different ways of porting models to iOS). He is also around here and knows a lot in the subject.

Question:

Tried to follow https://www.appcoda.com/core-ml-model-with-python/ To build pictures recognition I use Core ML(Turi Create) + Python + Swift(iOS).

Tried to upload the same image that I've used to train for ".mlmodel" file. Didn't help. Tried to load picture 100x100 size. The same error. What else can I try?

Output:

2018-04-17 20:54:19.076605+0200 [2516:1111075] [MC] System group container for systemgroup.com.apple.configurationprofiles path is /private/var/containers/Shared/SystemGroup/systemgroup.com.apple.configurationprofiles

2018-04-17 20:54:19.077580+0200 [2516:1111075] [MC] Reading from public effective user settings.

2018-04-17 20:54:54.795691+0200 [2516:1111075] [coreml] Error Domain=com.apple.CoreML Code=1 "Input image feature image does not match model description" UserInfo={NSLocalizedDescription=Input image feature image does not match model description, NSUnderlyingError=0x1c024cf90 {Error Domain=com.apple.CoreML Code=1 "Image is not valid width 227, instead is 224" UserInfo={NSLocalizedDescription=Image is not valid width 227, instead is 224}}}

2018-04-17 20:54:54.795728+0200 [2516:1111075] [coreml] Failure verifying inputs.

Due to request from comments:

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [String: Any]) {
    if let image = info[UIImagePickerControllerOriginalImage] as? UIImage {
        previewImg.image = image

        if let buffer = image.buffer(with: CGSize(width: 224, height: 224)) {

            guard let prediction = try? mlModel.prediction(image: buffer) else {
                fatalError("Unexpected runtime error")
            }

            descriptionLbl.text = prediction.foodType
            print(prediction.foodTypeProbability)
        } else {
            print("failed buffer")
        }
    }

    dismiss(animated: true, completion: nil)
}

Answer:

The error message literally says what the cause of the error is:

2018-04-17 20:54:54.795691+0200 [2516:1111075] [coreml] Error Domain=com.apple.CoreML Code=1 "Input image feature image does not match model description" UserInfo={NSLocalizedDescription=Input image feature image does not match model description, NSUnderlyingError=0x1c024cf90 {Error Domain=com.apple.CoreML Code=1 "Image is not valid width 227, instead is 224" UserInfo={NSLocalizedDescription=Image is not valid width 227, instead is 224}}}

The model you're using (I suspect it's SqueezeNet) expects input images of size 227x227, not 224x224 or any other size.

Question:

so, the question is that i want to implement filters like prisma app, i found that neural art work deep learning can be used to do it. But how to implement it in objective c or swift ? Anyone have any idea say ? thanks in advance !


Answer:

You can use Convolutional Neural Network for that. As a framework i suggest using TensorFlow. It works perfectly with CNN's plus the code can be written in C++.

Here is the sample https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ios_examples/

Question:

I was wondering if is it possible combining images and some "bios" data for finding patterns. For example, if I want to know if a image is a cat or dog and I have:

Enough image data for train my model

Enough "bios" data like:

size of the animal

size of the tail

weight

height

Thanks!


Answer:

Are you looking for a simple yes or no answer? In that case, yes. You are in complete control over building your models which includes what data you make them process and what predictions you get.

If you actually wanted to ask on how to do it, it will depend on specific datasets and application but one way to do it would be by having two models, one specialized for determining the output label (cat or dog) from the image - so perhaps some kind of a simple CNN. The other would process the text data and find patterns in that. Then at the end, you could have either a non-AI evaluator that would combine these two predictions into one naively or you could have both of these models as an input to a simple neural network that would learn pattern from the output of these two models.

That is just one way to possibly do it though and, as I said, the exact implementation will depend on a lot of other factors. How are both of the datasets labeled? Are the data connected to each other? Meaning that, for each picture, do you have some textual data that is for that specific image? Or do you jsut have a spearated dataset of pictures and separate dataset of biological information?

There is also the consideration that you'll probably want to make about the necessity of this approach. Current models can predict categories from processing images with super-human precision. Unless this is an excersise in creating a more complex model, this seems like an overkill.

PS: I wouldn't use term "bios" in this context, I believe it is not a very common usage and here on SO it will mostly confuse people into thinking you mean the actual BIOS.