Hadoop Sort / Merge / By-Group Processing

Hadoop Sort / Merge / By-Group Processing


Even if you pre-sort in Hadoop and then import to RRE there is no guarantee that the splits will contain whole by-groups or be processed in the correct order. Hence the options narrow to CSV input and:

1)            Hive or Pig for Sort, merge, and by-group processing. 

2)            rmr2 or plyrmr for by-group processing in R.   
Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use for other considerations.

Article ID: 3104162 - Last Review: 11/01/2015 04:06:00 - Revision: 1.0

Revolution Analytics

  • KB3104162