random forest tuning - tree depth and number of trees

rule of thumb for number of trees in random forest
random forest max depth rule of thumb
random forest sklearn
random forest classifier
hyperparameter tuning random forest in r
random forest python
random forest in depth
cross validation random forest python

I have basic question about tuning a random forest classifier. Is there any relation between the number of trees and the tree depth? Is it necessary that the tree depth should be smaller than the number of trees?

I agree with Tim that there is no thumb ratio between the number of trees and tree depth. Generally you want as many trees as will improve your model. More trees also mean more computational cost and after a certain number of trees, the improvement is negligible. As you can see in figure below, after sometime there is no significant improvement in error rate even if we are increasing no of tree.

The depth of the tree meaning length of tree you desire. Larger tree helps you to convey more info whereas smaller tree gives less precise info.So depth should large enough to split each node to your desired number of observations.

Below is example of short tree(leaf node=3) and long tree(leaf node=6) for Iris dataset: Short tree(leaf node=3) gives less precise info compared to long tree(leaf node=6).

Short tree(leaf node=3):

Long tree(leaf node=6):

In Depth: Parameter tuning for Random Forest, 2 Answers � Number of trees: the bigger the better: yes. � Tree depth: there are several ways to control how deep your trees are (limit the maximum depth, limit the� In the case of a random forest, hyperparameters include the number of decision trees in the forest and the number of features considered by each tree when splitting a node. (The parameters of a random forest are the variables and thresholds used to split each node learned during training).

For most practical concerns, I agree with Tim.

Yet, other parameters do affect when the ensemble error converges as a function of added trees. I guess limiting the tree depth typically would make the ensemble converge a little earlier. I would rarely fiddle with tree depth, as though computing time is lowered, it does not give any other bonus. Lowering bootstrap sample size both gives lower run time and lower tree correlation, thus often a better model performance at comparable run-time. A not so mentioned trick: When RF model explained variance is lower than 40%(seemingly noisy data), one can lower samplesize to ~10-50% and increase trees to e.g. 5000(usually unnecessary many). The ensemble error will converge later as a function of trees. But, due to lower tree correlation, the model becomes more robust and will reach a lower OOB error level converge plateau.

You see below samplesize gives the best long run convergence, whereas maxnodes starts from a lower point but converges less. For this noisy data, limiting maxnodes still better than default RF. For low noise data, the decrease in variance by lowering maxnodes or sample size does not make the increase in bias due to lack-of-fit.

For many practical situations, you would simply give up, if you only could explain 10% of variance. Thus is default RF typically fine. If your a quant, who can bet on hundreds or thousands of positions, 5-10% explained variance is awesome.

the green curve is maxnodes which kinda tree depth but not exactly.

library(randomForest)

X = data.frame(replicate(6,(runif(1000)-.5)*3))
ySignal = with(X, X1^2 + sin(X2) + X3 + X4)
yNoise = rnorm(1000,sd=sd(ySignal)*2)
y = ySignal + yNoise
plot(y,ySignal,main=paste("cor="),cor(ySignal,y))

#std RF
rf1 = randomForest(X,y,ntree=5000) 
print(rf1)
plot(rf1,log="x",main="black default, red samplesize, green tree depth")

#reduced sample size
rf2 = randomForest(X,y,sampsize=.1*length(y),ntree=5000) 
print(rf2)
points(1:5000,rf2$mse,col="red",type="l")

#limiting tree depth (not exact )
rf3 = randomForest(X,y,maxnodes=24,ntree=5000)
print(rf2)
points(1:5000,rf3$mse,col="darkgreen",type="l")

Practical questions on tuning Random Forests, In this post, I will be taking an in-depth look at hyperparameter tuning for Random Forest parameter specifies the number of trees in the forest of the model. max_depth parameter specifies the maximum depth of each tree. In Depth: Parameter tuning for Random Forest. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to

It is true that generally more trees will result in better accuracy. However, more trees also mean more computational cost and after a certain number of trees, the improvement is negligible. An article from Oshiro et al. (2012) pointed out that, based on their test with 29 data sets, after 128 of trees there is no significant improvement(which is inline with the graph from Soren).

Regarding the tree depth, standard random forest algorithm grow the full decision tree without pruning. A single decision tree do need pruning in order to overcome over-fitting issue. However, in random forest, this issue is eliminated by random selecting the variables and the OOB action.

Reference: Oshiro, T.M., Perez, P.S. and Baranauskas, J.A., 2012, July. How many trees in a random forest?. In MLDM (pp. 154-168).

Optimizing Hyperparameters in Random Forest Classification, Section 3 focuses on the influence of the number M of trees. Section 4 is devoted to the connection between the tree depth (nodesize, maxnodes or kn) and the� The number of trees in the Random Forest depends on the number of rows in the data set. I was doing an experiment when tuning the number of trees on 72 classification tasks from OpenML-CC18 benchmark. I got such dependency between optimal number of trees and number of rows in the data:

Tuning parameters in random forests, Random forest achieves a lower test error solely by variance reduction. Therefore increasing the number of trees in the ensemble won't have any effect on the bias � Random Forest was first proposed by Tin Kam Ho at Bell Laboratories in 1995. A large number of trees can over-perform an individual tree by reducing the errors that usually arise whilst considering a single tree. When one tree goes wrong, the other tree might perform well.

How to tune RF parameters in practice?, I trained random forest models with different forest sizes (aka tree count in the forest) and tree depth. I checked the accuracy and the memory usage in the� Random Forest is an ensemble of decision trees. The single decision tree is very sensitive to data variations. It can easily overfit to noise in the data. The Random Forest with only one tree will overfit to data as well because it is the same as a single decision tree. When we add trees to the Random Forest then the tendency to overfitting

How to determine the number of trees to be generated in Random , tuning of the different parameters of the algorithm is not mandatory to obtain a. 1 bound on forest accuracy which depends on the number of trees. Mentch and Hooker Figure 1: A decision tree of depth k = 2 in dimension d = 2. Random� Rather, an individual tree chooses a different random sample of 15 features for each split. Like min_samples_leaf, this doesn’t allow a tree to fit too closely to the data. More importantly, the trees in the Random Forest are now even less correlated with one another since they weren’t even trained on the same data.

[PDF] Impact of subsampling and tree depth on random forests. 1 , A random forest is a meta estimator that fits a number of decision tree classifiers on various A split point at any depth will only be considered if it leaves at least min_samples_leaf Grow trees with max_leaf_nodes in best-first fashion. What about depth of the trees?.. How to choose the reasonable one? Is there a sense in experementing with trees of different length in one forest and what is the guidance for that? 1. Tune depth using CV, or 2. Combine multiple different-depth tree-based models together through ensembling. Are there any other parameters worth looking at when

Comments
  • @B.ClayShannon Random forests is a machine learning method. His question totally belongs here.
  • I have never heard of a rule of thumb ratio between the number of trees and tree depth. Generally you want as many trees as will improve your model. The depth of the tree should be enough to split each node to your desired number of observations.
  • @TimBiegeleisen here's my thumb rule :)
  • There has been some work that says best depth is 5-8 splits. It is, of course, problem and data dependent. Think about the response as a surface with a multivariate input, and each leaf as wanting to split on regions with highest magnitude of slope. If you have enough points to inform the math, then more splits will be made to represent the surface until you hit a "max depth" wall. If your data is sparse enough or noisy enough, then it can't cleanly detect slope, and isn't going to split as well. If there is a relationship, it also relates to mtry - the number of columns informing split.
  • Thank you so much for the explanation. I could understand to some extent what you mean, however, since I am still getting used to this whole concept of developing random forest models, I have a few more questions based on your answer. What exactly is the tree correlation and how do you measure it? Is the OOB estimate and ensemble error the same things? Since these could be very basic, you could let me know if there is an article if I can read up to understand the terms better.Thanks a lot!
  • Tree correlation means that 2 trees are correlated in terms of predictor variable on which split is made. In bagging, OOB(out of bag) means that on average,we are able to use only 2/3rd of our dataset for building our tree and rest 1/3rd are not used. So we are trying to make prediction from OOB 1/3rd dataset.
  • thanks, this looks great, nice tip! But for me it didn't work, setting sampsize to .1, .2, .3, etc. didn't result in lower mse or higher rsq, not even for 5000 trees. It was only negligibly lower for .5 (3.371 instead of the default 3.377 :-)).