Hadoop Sort / Merge / By-Group Processing
Even if you pre-sort in Hadoop and then import to RRE there is no guarantee that the splits will contain whole by-groups or be processed in the correct order. Hence the options narrow to CSV input and:
1) Hive or Pig for Sort, merge, and by-group processing.
2) rmr2 or plyrmr for by-group processing in R.
Article ID: 3104162 - Last Review: 11/01/2015 04:06:00 - Revision: 1.0