Hadoop Sort / Merge / By-Group Processing

Hadoop Sort / Merge / By-Group Processing

Workarounds

Even if you pre-sort in Hadoop and then import to RRE there is no guarantee that the splits will contain whole by-groups or be processed in the correct order. Hence the options narrow to CSV input and:

1)            Hive or Pig for Sort, merge, and by-group processing. 

2)            rmr2 or plyrmr for by-group processing in R.   
Properties

Article ID: 3104162 - Last Review: 1 Nov 2015 - Revision: 1

Feedback