Hadoop Sort / Merge / By-Group ProcessingWorkaroundsEven if you pre-sort in Hadoop and then import to RRE there is no guarantee that the splits will contain whole by-groups or be processed in the correct order. Hence the options narrow to CSV input and:1)           Hive or Pig for Sort, merge, and by-group processing. 2)           rmr2 or plyrmr for by-group processing in R. Â
Hadoop Sort / Merge / By-Group Processing
Applies To
Revolution Analytics