Hot questions for Using Neural networks in amazon web services
I am trying to train a network on Caffe. I have image size of 512x640. Batch size is 1. I'm trying to implement FCN-8s.
I am currently running this on a Amazon EC2 instance (g2.2xlarge) with 4GB of GPU memory. But when I run the solver, it immediately throws out an error
Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped)
Can someone help me proceed from here?
The error you get is indeed out of memory, but it's not the RAM, but rather GPU memory (note that the error comes from CUDA). Usually, when caffe is out of memory - the first thing to do is reduce the batch size (at the cost of gradient accuracy), but since you are already at batch size = 1... Are you sure batch size is 1 for both TRAIN and TEST phases?
I am looking for an Ubuntu AMI for AWS which has Caffe installed and works properly with GPU. There are some on caffe's Github page but they seem to be not working. Is there a recently tested AMI available now ?
You may try this AMI offered by Stanford. Remember to choose the region as
US West (N.California)
Hope this helps.
Recently, I have found and used one AMI that contains basically everything you need to run Caffe. You might find it here
I am doing a classification task on 300k x 24 inputs and correspondingly have 300k x 25 outputs. I get error message of "Killed". Following this question I found that OutOfMemory was raised. I am running the simulation on Amazon c4.large instance. I was hoping that removal of environment variables such as labels, vals would solve the problem but it didn't. Any thoughts on how to bypass the problem? Code I am running can be found below:
numLabels <- 25 numInput <- 24 newThreshold <- 10000 # 25th col of vals represents the data quality # Will not be used in training randperm <- sample(length(vals$V1)) train <- cbind(vals, labels) train <- train[randperm,] rm(list=c('labels', 'vals')) f <- as.formula(X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20 + X21 + X22 + X23 + X24 + X25 ~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10 + V11 + V12 + V13 + V14 + V15 + V16 + V17 + V18 + V19 + V20 + V21 + V22 + V23 + V24, env = train) nn <- neuralnet(formula = f, threshold = newThreshold, data=train, hidden = c(100), linear.output=FALSE, err.fct='ce', act.fct='logistic', lifesign = 'full', lifesign.step = 100, stepmax=10000, rep = 2) prediction <- compute(nn, train[,1:numInput])$net.result > 0.5 print(mean(prediction == train[,numInput+1:ncol(train)]))
Have you tried extending the memory limit?
Look at this blog post on the topic: