Hadoop Sort / Merge / By-Group Processing

Hadoop Sort / Merge / By-Group Processing

Workarounds

Even if you pre-sort in Hadoop and then import to RRE there is no guarantee that the splits will contain whole by-groups or be processed in the correct order. Hence the options narrow to CSV input and:

1)            Hive or Pig for Sort, merge, and by-group processing. 

2)            rmr2 or plyrmr for by-group processing in R.   
Eigenschappen

Artikel-id: 3104162 - Laatst bijgewerkt: 1 nov. 2015 - Revisie: 1

Feedback