QA: How do the RevoScaleR chunking algorithms work?

You can use the same RevoScaleR functions to process huge data sets stored on disk as you do to analyze in-memory data frames. This is because RevoScaleR functions use 'chunking' algorithms. Basically, chunking algorithms follow this process:
  1. Initialization: intermediate results needed for computation of final statistics are initialized
  2. Read data: read a chunk (set of observations of variables) of data
  3. Transform data: perform transformations and row selections for the chunk of data as needed; write out data if only performing import or data step
  4. Process data: compute intermediate results for the chunk of data
  5. Update results: combine the results from the chunk of data with those of previous chunks
  6. Repeat steps (2) - (5) (perhaps in parallel) until all data has been processed
  7. Process results: when results from all the chunks have been completed, do final computations and return results
Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use for other considerations.

Article ID: 3104271 - Last Review: 10/29/2015 08:48:00 - Revision: 1.0

Revolution Analytics

  • KB3104271