Tuning Forest and Boosted Tree Prediction Speed on Hadoop

Forest and Boosted Tree Prediction Speed on Hadoop  
  • By default, rxPredict launches one MR job per tree to minimize memory usage
  • For smallish data sets, call rxPredict inside rxExec or set scheduleOnce=TRUE (in 7.3) to reduce the scheduling overhead
–      rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = TRUE, ...)
  • For larger data sets, set scheduleOnce=1 to do prediction in parallel using a single MR job (available in 7.3; internally, uses rxDataStep to call predict.randomForest; requires the randomForest package )
–      rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = 1, ...
Eigenschappen

Artikel-id: 3104165 - Laatst bijgewerkt: 1 nov. 2015 - Revisie: 1

Feedback