Applies To
Revolution Analytics

Windows/Linux Block Size

  • When choosing block size, try to select rowsPerRead to yield ~10M elements in the block, or even less

    • With 20 columns, rowsPerRead=500e3

    • With 1000 cols, rowsPerRead=1000

  • This tends to give a block size such that you can process multiple blocks per read

  • Use blocksPerRead > 1

    • The exact value depends on how much RAM you have available

    • Generally having multiple blocks in memory simultaneously improves performance

  • It is easy to increase blocksPerRead, but expensive to re-block, so err on the side of having smaller blocks

  • If you use rxSplit() or rxDataStep() to create samples, e.g. training/validation, then use rxDataStep() to re-block according to the previous principle

Need more help?

Want more options?

Explore subscription benefits, browse training courses, learn how to secure your device, and more.