Windows/Linux Block Size
-
When choosing block size, try to select rowsPerRead to yield ~10M elements in the block, or even less
-
With 20 columns, rowsPerRead=500e3
-
With 1000 cols, rowsPerRead=1000
-
-
This tends to give a block size such that you can process multiple blocks per read
-
Use blocksPerRead > 1
-
The exact value depends on how much RAM you have available
-
Generally having multiple blocks in memory simultaneously improves performance
-
-
It is easy to increase blocksPerRead, but expensive to re-block, so err on the side of having smaller blocks
-
If you use rxSplit() or rxDataStep() to create samples, e.g. training/validation, then use rxDataStep() to re-block according to the previous principle