Tuning Forest and Tree Modeling Accuracy

Forest and Tree Modeling Accuracy

Tune rxDForest parameters (speed trade-off)   (*: OSR and RRE defaults)

–      Increase nTree, e.g. to 20 or more   (OSR=500, RRE=10)*

–      Increase maxDepth, e.g. to 20 or more   (OSR=N/A, RRE=10)*

–      Decrease minSplit, e.g. to 2   (OSR=5, RRE=sqrt(N))*

–      Increase mTry, e.g. to 40 or more   (OSR/RRE=sqrt(p) or p/3)*

–      Increase maxNumBins, e.g. to 1e5 or 1e6

–      Accuracy of 81.4% with the KDD dataset using the following with a further increase to 82.3% when ntree=200:

ntree=20, mtry=40, minSplit=2, maxDepth=20, maxNumBins=1e6
  • Alternatively, run the open source randomForest routine across the Hadoop cluster using rxExec
–      See randomShrubbery in Section 6.5 of our Distributed Computing Guide

–      Adjust MR memory limits if needed since data must fit within memory on each node.

Article ID: 3104233 - Last Review: 29 Oct 2015 - Revision: 1