Sign in with Microsoft
Sign in or create an account.
Hello,
Select a different account.
You have multiple accounts
Choose the account you want to sign in with.

Forest and Boosted Tree Prediction Speed on Hadoop

  • By default, rxPredict launches one MR job per tree to minimize memory usage

  • For smallish data sets, call rxPredict inside rxExec or set scheduleOnce=TRUE (in 7.3) to reduce the scheduling overhead

–      rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = TRUE, ...)

  • For larger data sets, set scheduleOnce=1 to do prediction in parallel using a single MR job (available in 7.3; internally, uses rxDataStep to call predict.randomForest; requires the randomForest package )

–      rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = 1, ...

Need more help?

Want more options?

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.

Was this information helpful?

What affected your experience?
By pressing submit, your feedback will be used to improve Microsoft products and services. Your IT admin will be able to collect this data. Privacy Statement.

Thank you for your feedback!

×