Hot questions for Using Neural networks in r caret

Question:

Code:

library(nnet)
library(caret)

#K-folds resampling method for fitting model
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
                     allowParallel = TRUE) #10 separate 10-fold cross-validations

nnetGrid <- expand.grid(decay = seq(0.0002, .0008, length = 4), 
                        size = seq(6, 10, by = 2), 
                        bag = FALSE)

set.seed(100)
nnetFitcv <- train(R ~ .,
                  data = trainSet,
                  method = "avNNet",
                  tuneGrid = nnetGrid,
                  trControl = ctrl,
                  preProc = c("center", "scale"),
                  linout = TRUE,
                  ## Reduce the amount of printed output
                  trace = FALSE,
                  ## Expand the number of iterations to find
                  ## parameter estimates..
                  maxit = 2000,
                  ## and the number of parameters used by the model
                  MaxNWts = 5 * (34 + 1) + 5 + 1)

Error:

Error in train.default(x, y, weights = w, ...) : 
  final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
  missing values found in aggregated results

data:

dput(head(trainSet))
structure(list(fy = c(317.913756282, 365.006253069, 392.548100067, 
305.350697829, 404.999341917, 326.558279739), fu = c(538.962896683, 
484.423120589, 607.974981919, 566.461909098, 580.287855801, 454.178316794
), E = c(194617.707566, 181322.455065, 206661.286272, 182492.029532, 
189867.929239, 181991.379749), eu = c(0.153782620813, 0.208857408687, 
0.29933255604, 0.277013319499, 0.251278125174, 0.20012525805), 
    imp_local = c(1555.3450957, 1595.41614044, 763.56392418, 
    1716.78277731, 1045.72429616, 802.742305814), imp_global = c(594.038972858, 
    1359.48216529, 1018.89209367, 850.887850177, 1381.3557372, 
    1714.66351462), teta1c = c(0.033375064111, 0.021482368218, 
    0.020905367537, 0.006956337817, 0.034913536977, 0.03009770223
    ), k1c = c(4000921.55552, 4499908.41979, 9764999.26902, 9273400.46159, 
    6163057.88855, 12338543.5703), k2_2L = c(98633499.5682, 53562216.5496, 
    51597126.6866, 79496746.0098, 54060378.6334, 88854286.5457
    ), k2_3L = c(53752551.0262, 125020222.794, 124021434.482, 
    125817803.431, 75021821.6702, 35160224.288), k2_4L = c(56725106.5978, 
    126865701.893, 145764489.664, 64837586.8755, 49128911.0832, 
    70088564.0166), bmaxc = c(3481281.32908, 4393584.00639, 2614830.02391, 
    3128593.72039, 3179348.29527, 4274637.35956), dfactorc = c(2.5474729895, 
    2.94296926288, 2.79505551368, 2.47882735165, 2.46407943564, 
    1.41121223341), amaxc = c(73832.9746763, 99150.5068997, 77165.4338508, 
    128546.996471, 53819.0447533, 54870.9707106), teta1s = c(0.015467320192, 
    0.013675755546, 0.031668366149, 0.028898297322, 0.019211801086, 
    0.013349768955), k1s = c(5049506.54552, 11250622.6842, 13852560.5089, 
    18813117.5726, 18362782.7372, 14720875.0829), k2_ab1s = c(276542468.441, 
    275768806.723, 211613299.608, 264475187.749, 162043062.526, 
    252936228.465), k2_ab2s = c(108971516.033, 114017918.32, 
    248886114.151, 213529935.615, 236891513.077, 142986118.909
    ), k2_ab3s = c(33306211.9166, 28220338.4744, 40462423.2281, 
    23450400.4429, 46044346.1128, 23695405.2598), bmaxab1 = c(4763935.86742, 
    4297372.01966, 3752983.00638, 4861240.46459, 4269771.8481, 
    4162098.23435), bmaxab2 = c(1864128.647, 1789714.6047, 2838412.50704, 
    2122535.96812, 2512362.60884, 1176995.61871), ab1 = c(66.4926766666, 
    42.7771212442, 45.4212664748, 50.3764074404, 35.4792060556, 
    34.1116517971), ab2 = c(21.0285105309, 23.5869838719, 18.8524808986, 
    10.1121885612, 10.9695055644, 12.1154127169), dfactors = c(2.47803921947, 
    0.874644748155, 0.749837099991, 1.96711589185, 2.5407774352, 
    1.28554379333), teta1f = c(0.037308451805, 0.035718600749, 
    0.012495093438, 0.000815957999, 0.002155991091, 0.02579104469
    ), k1f = c(14790480.9871, 17223538.1853, 19930679.8931, 3524230.46974, 
    15721827.0137, 13599317.0371), k2f = c(55614283.976, 54695745.7762, 
    86690362.7036, 99857853.7312, 63119072.711, 37510791.5472
    ), bmaxf = c(2094770.19484, 3633133.51482, 1361188.05421, 
    2001027.51219, 2534273.6726, 3765850.14143), dfactorf = c(0.745459795314, 
    2.04869176933, 0.853221909609, 1.76652410119, 0.523675021418, 
    1.0808768613), k2b = c(1956.92858062, 1400.78738327, 1771.23607857, 
    1104.05501369, 1756.6767193, 1509.9294956), amaxb = c(38588.0915097, 
    35158.1672213, 25711.062782, 21103.1603387, 27230.6973685, 
    43720.3558889999), dfactorb = c(0.822346959126, 2.34421354848, 
    0.79990635332, 2.99070447299, 1.76373031599, 1.38640223249
    ), roti = c(16.1560390049, 12.7223971386, 6.43238062144, 
    15.882552267, 16.0836252663, 18.2734832893), rotmaxbp = c(0.235615453341, 
    0.343204895932, 0.370304533553, 0.488746319999, 0.176135112774, 
    0.46921999001), R = c(0.022186087, 0.023768855, 0.023911029, 
    0.023935705, 0.023655335, 0.022402726)), .Names = c("fy", 
"fu", "E", "eu", "imp_local", "imp_global", "teta1c", "k1c", 
"k2_2L", "k2_3L", "k2_4L", "bmaxc", "dfactorc", "amaxc", "teta1s", 
"k1s", "k2_ab1s", "k2_ab2s", "k2_ab3s", "bmaxab1", "bmaxab2", 
"ab1", "ab2", "dfactors", "teta1f", "k1f", "k2f", "bmaxf", "dfactorf", 
"k2b", "amaxb", "dfactorb", "roti", "rotmaxbp", "R"), row.names = c(7L, 
8L, 20L, 23L, 28L, 29L), class = "data.frame")

data has no equal rows or zero values or NaNs. Any help is appreciated.


Answer:

I guess the problem is caused by MaxNWts, which is The maximum allowable number of weights. The value you gave is less than the weights for networks with size larger than 5 units. It should be at least:

MaxNWts = max(nnetGrid$size)*(ncol(trainSet) + output_neron) 
          + max(nnetGrid$size) + output_neron

So, in your case, it should be at least MaxNWts = 10 * (34 + 1) + 10 + 1

Question:

I'm working on tuning parameters for a neural network exercise on the Boston dataset. I have been getting a persistent error:

Error: The tuning parameter grid should have columns size, decay

The following is the set up of my Caret tuning:

caret_control <- trainControl(method = "repeatedcv",
                       number = 10,
                       repeats = 3)

caret_grid <- expand.grid(batch_size=seq(60,120,20),
                      dropout=0.5,
                      size=100,
                      decay = 0,
                      lr=2e-6,
                      activation = "relu")

caret_t <- train(medv ~ ., data = chasRad, 
             method = "nnet", 
             metric="RMSE",
             trControl = caret_control, 
             tuneGrid = caret_grid,
             verbose = FALSE)

Here chasRad is a 12x506 matrix. Could anyone help on fixing the error that seems triggered by the expanded grid?


Answer:

The error you're getting should be interpreted as:

"The tuning parameter grid should ONLY have columns size, decay".

You're passing in four additional parameters that nnet can't tune in caret. For a full list of parameters that are tunable, run modelLookup(model = 'nnet').

To tune only size and decay, replace your caret_grid with:

caret_grid <- expand.grid(size=seq(from = 1, to = 10, by = 1),
                      decay = seq(from = 0.1, to = 0.5, by = 0.1))

and your code will run.

Question:

I have general question regarding the scaling of predictors in a neural network. I'm using the avNNet algorithm in R / Caret for a regression; I have both categorical and numerical predictors.

As far as I have understood, predictors have to be scaled prior to the modeling step:

For lack of better prior information, it is common to standardize each input to the same range or the same standard deviation. [...] In particular, scaling the inputs to [-1,1] will work better than [0,1] (http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-16.html)

If I scale my continuous predictors to the range [-1,1], what about my categorical predictors which are coded as [0 | 1]? Should I replace the zeros by -1?

Kind regards,

Requin


Answer:

No. The categories are of a different conceptual type (and data type) from the inputs or the weights. The categories are an enumeration (0, 1, 2, ...), and are typically distinct from one another, i.e. category 0 is no more similar to category 1 than it is to category 150.

The weights are on a continuum up floating-point values; this algorithm works best when those values are in the same range for each dimension (input feature) and evenly distributed about 0.

Scale the inputs as described; leave the categories just as you have them, at 0 | 1.