Hot questions for Using Neural networks in h2o

Question:

I'm exploring h2o via the R interface and I'm getting a weird weight matrix. My task is as simple as they get: given x,y compute x+y. I have 214 rows with 3 columns. The first column(x) was drawn uniformly from (-1000, 1000) and the second one(y) from (-100,100). I just want to combine them so I have a single hidden layer with a single neuron. This is my code:

library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
train <- h2o.importFile(path = "/home/martin/projects/R NN Addition/addition.csv")
model <- h2o.deeplearning(1:2,3,train, hidden = c(1), epochs=200, export_weights_and_biases=T, nfolds=5)
print(h2o.weights(model,1))
print(h2o.weights(model,2))

and the result is

> print(h2o.weights(model,1))
          x          y
1 0.5586579 0.05518193

[1 row x 2 columns] 
> print(h2o.weights(model,2))
        C1
1 1.802469

For some reason the weight value for y is 0.055 - 10 times lower than for x. So, in the end the neural net would compute x+y/10. However, h2o.predict actually returns the correct values (even on a test set). I'm guessing there's a preprocessing step that's somehow scaling my data. Is there any way I can reproduce the actual weights produced by the model? I would like to be able to visualize some pretty simple neural networks.


Answer:

Neural networks perform best if all the input features have mean 0 and standard deviation 1. If the features have very different standard deviations, neural networks perform very poorly. Because of that h20 does this normalization for you. In other words, before even training your net it computes mean and standard deviation of all the features you have, and replaces the original values with (x - mean) / stddev. In your case the stddev for the second feature is 10x smaller than for the first, so after the normalization the values end up being 10x more important in terms of how much they contribute to the sum, and the weights heading to the hidden neuron need to cancel it out. That's why the weight for the second feature is 10x smaller.

Question:

For a certain combination of parameters in the deeplearning function of h2o, I get different results each time I run it.

args <- list(list(hidden = c(200,200,200), 
                  loss = "CrossEntropy",  
                  hidden_dropout_ratio = c(0.1, 0.1,0.1), 
                  activation = "RectifierWithDropout",  
                  epochs = EPOCHS))

run   <- function(extra_params) {
  model <- do.call(h2o.deeplearning, 
                   modifyList(list(x = columns, y = c("Response"),  
                   validation_frame = validation, distribution = "multinomial",
                   l1 = 1e-5,balance_classes = TRUE, 
                   training_frame = training), extra_params))
}

model <- lapply(args, run) 

What would I need to do in order to get consistent results for the model each time I run this?


Answer:

Deeplearning with H2O will not be reproducible if it is run on more than a single core. The results and performance metrics may vary slightly from what you see each time you train the deep learning model. The implementation in H2O uses a technique called "Hogwild!" which increases the speed of training at the cost of reproducibility on multiple cores.

So if you want reproducible results you will need to restrict H2O to run on a single core and make sure to use a seed in the h2o.deeplearning call.

Edit based on comment by Darren Cook: I forgot to include the reproducible = TRUE parameter that needs to be set in combination with the seed to make it truly reproducible. Note that this will make it a lot slower to run. And is is not advisable to do this with a large dataset.

More information on "Hogwild!"

Question:

I am currently using H2O's AutoML for a data science project. However, nowhere in the documentation or on the internet or in the code I can find how AutoML treats factor variables - does it do one-hot encoding? Label encoding? Something more advanced? Does it consider how many levels there are? Does it depend on the algorithm?

Currently, AutoML performs really badly (marginally above the baseline), and I suspect it's because it doesn't treat categoricals right, which make up about 90% of my predictors.


Answer:

AutoML automatically runs the supervised learning models that are available in H2O-3. So how AutoML handles categoricals depends on the default categorical handling of the given model it is running. Documentation on the handling of categoricals can be found here, if you are interested in a particular algorithm use the same documentation to find your algorithm of interest and review details of how it handles categorical values or use the Python or R API documentation to look up the default values.