QA: How do the RevoScaleR chunking algorithms work?

Support

Hello,

Select a different account.

You have multiple accounts

Choose the account you want to sign in with.

Revolution Analytics More...Less

You can use the same RevoScaleR functions to process huge data sets stored on disk as you do to analyze in-memory data frames. This is because RevoScaleR functions use 'chunking' algorithms. Basically, chunking algorithms follow this process:

Initialization: intermediate results needed for computation of final statistics are initialized
Read data: read a chunk (set of observations of variables) of data
Transform data: perform transformations and row selections for the chunk of data as needed; write out data if only performing import or data step
Process data: compute intermediate results for the chunk of data
Update results: combine the results from the chunk of data with those of previous chunks
Repeat steps (2) - (5) (perhaps in parallel) until all data has been processed
Process results: when results from all the chunks have been completed, do final computations and return results