Tuning Options for ScaleR text Imports - Microsoft Support

Support

Sign in

Sign in with Microsoft

Sign in or create an account.

Hello,

Select a different account.

You have multiple accounts

Choose the account you want to sign in with.

Applies To

Revolution Analytics

Windows/Linux Block Size

When choosing block size, try to select rowsPerRead to yield ~10M elements in the block, or even less
- With 20 columns, rowsPerRead=500e3
- With 1000 cols, rowsPerRead=1000
This tends to give a block size such that you can process multiple blocks per read
Use blocksPerRead > 1
- The exact value depends on how much RAM you have available
- Generally having multiple blocks in memory simultaneously improves performance
It is easy to increase blocksPerRead, but expensive to re-block, so err on the side of having smaller blocks
If you use rxSplit() or rxDataStep() to create samples, e.g. training/validation, then use rxDataStep() to re-block according to the previous principle

Email

SUBSCRIBE RSS FEEDS

Need more help?

Want more options?

Discover Community

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

Microsoft 365 subscription benefits

Microsoft 365 training

Microsoft security

Accessibility center

Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.

Ask the Microsoft Community

Microsoft Tech Community

Windows Insiders

Microsoft 365 Insiders