Tuning Forest and Boosted Tree Prediction Speed on Hadoop - Microsoft Support

Support

Sign in

Sign in with Microsoft

Sign in or create an account.

Hello,

Select a different account.

You have multiple accounts

Choose the account you want to sign in with.

Applies To

Revolution Analytics

Forest and Boosted Tree Prediction Speed on Hadoop

By default, rxPredict launches one MR job per tree to minimize memory usage
For smallish data sets, call rxPredict inside rxExec or set scheduleOnce=TRUE (in 7.3) to reduce the scheduling overhead

– rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = TRUE, ...)

For larger data sets, set scheduleOnce=1 to do prediction in parallel using a single MR job (available in 7.3; internally, uses rxDataStep to call predict.randomForest; requires the randomForest package )

– rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = 1, ...

Email

SUBSCRIBE RSS FEEDS

Need more help?

Want more options?

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

Microsoft 365 subscription benefits

Microsoft 365 training

Microsoft security

Accessibility center