Tuning Forest and Tree Modeling Accuracy

Forest and Tree Modeling Accuracy

Tune rxDForest parameters (speed trade-off)   (*: OSR and RRE defaults)

–      Increase nTree, e.g. to 20 or more   (OSR=500, RRE=10)*

–      Increase maxDepth, e.g. to 20 or more   (OSR=N/A, RRE=10)*

–      Decrease minSplit, e.g. to 2   (OSR=5, RRE=sqrt(N))*

–      Increase mTry, e.g. to 40 or more   (OSR/RRE=sqrt(p) or p/3)*

–      Increase maxNumBins, e.g. to 1e5 or 1e6

–      Accuracy of 81.4% with the KDD dataset using the following with a further increase to 82.3% when ntree=200:

ntree=20, mtry=40, minSplit=2, maxDepth=20, maxNumBins=1e6
  • Alternatively, run the open source randomForest routine across the Hadoop cluster using rxExec
–      See randomShrubbery in Section 6.5 of our Distributed Computing Guide

–      Adjust MR memory limits if needed since data must fit within memory on each node.
Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use for other considerations.
Properties

Article ID: 3104233 - Last Review: 10/29/2015 06:54:00 - Revision: 1.0

Revolution Analytics

  • KB3104233
Feedback