QA: How do the RevoScaleR chunking algorithms work?

You can use the same RevoScaleR functions to process huge data sets stored on disk as you do to analyze in-memory data frames. This is because RevoScaleR functions use 'chunking' algorithms. Basically, chunking algorithms follow this process:
  1. Initialization: intermediate results needed for computation of final statistics are initialized
  2. Read data: read a chunk (set of observations of variables) of data
  3. Transform data: perform transformations and row selections for the chunk of data as needed; write out data if only performing import or data step
  4. Process data: compute intermediate results for the chunk of data
  5. Update results: combine the results from the chunk of data with those of previous chunks
  6. Repeat steps (2) - (5) (perhaps in parallel) until all data has been processed
  7. Process results: when results from all the chunks have been completed, do final computations and return results
Properties

Article ID: 3104271 - Last Review: 29 Oct 2015 - Revision: 1

Feedback