In Part 1, we saw the word count example. Lets built more on top of it.
A very common use case of Word Count example would be to find, Top 100 words. Using MapReduce, you would use Secondary Sort and get this. Let try to achieve the same functionality using Crunch
Find Top 100 most occurred words in an input.
NOTE: We shall build this on WordCount code
PTable<String, Long> top100 = counts.top(100); // Instruct the pipeline to write the resulting counts to a text file. pipeline.writeTextFile(top100, args);
We just need to add Line# 1. This line essentially gets the Top 100 words for us. To find the least frequently occurring words, a similar bottom() API is present.
Run the example again and see the output.