Hadoop Recipe – Implementing Custom Partitioner

This recipe is about implementing custom Parititoner

A Partitioner in MapReduce world partitions the key space. The partitioner is used to derive the partition to which a key-value pair belongs. It is responsible for bring records with same key to same partition so that they can be processed together by a reducer.

To implement a Custom Partitioner,we need to extend the Partitioner class.
Let's look at the code for Partitioner class

public abstract class Partitioner<KEY, VALUE> {
  
  /** 
   * Get the partition number for a given key (hence record) given the total 
   * number of partitions i.e. number of reduce-tasks for the job.
   *   
   * <p>Typically a hash function on a all or a subset of the key.</p>
   *
   * @param key the key to be partioned.
   * @param value the entry value.
   * @param numPartitions the total number of partitions.
   * @return the partition number for the <code>key</code>.
   */
  public abstract int getPartition(KEY key, VALUE value, int numPartitions);  
}


The default partitioner is HashPartitioner, which finds a partition based on hash of the key class

public class HashPartitioner<K, V> extends Partitioner<K, V> {

  /** Use {@link Object#hashCode()} to partition. */
  public int getPartition(K key, V value,
                          int numReduceTasks) {
    return (key.hashCode() &amp;amp;amp;amp;amp;amp;amp;amp; Integer.MAX_VALUE) % numReduceTasks;
  }
}

Ok, now lets implement a custom partitioner. Assume that we have a Text key, and we want to use first char as deciding factor for the determining the partition.

public class FirstCharTextPartitioner<Text, Text> extends Partitioner<K, V> {

  public int getPartition(Text key, Text value,
                          int numReduceTasks) {
    return (key.toString().charAt(0)) % numReduceTasks;
  }
}

// Set this to the JobConf

The code takes first char from the key as a deciding factor for determining the partition. So based on this, all keys that begin with 'A' shall be processed by same reducer. The next step is to set our custom class this class to the Job and framework shall use our custom Partitioning logic.

Based on the needs we can choose our own way of partitioning the key space.

Before implementing Custom Partitioner, its best to check following Partitioner provided by Hadoop

  1. BinaryPartitioner
  2. HashPartitioner
  3. KeyFieldBasedPartitioner
  4. TotalOrderPartitioner


References


5 thoughts on “Hadoop Recipe – Implementing Custom Partitioner

  1. Hi, i don’t know how is in hadoop 0.23 but in 0.20.2 in your MyPartitioner class you have to add static to make it works. I hope that it helps to someone.

  2. “return (key.toString().charAt(0)) % numReduceTasks;”

    May i know if the above statement for customer partitioner includes hashcode() – guess it should be:

    (key.toString().charAt(0).hashcode()) % numReduceTasks;

    So, Now based on the 1st character partitioner decides whiich (K,V) has to go to which reducer.

    Please correct me incase if iam wrong in this case..

  3. How to use the custom partitioner, when we need to seperate the keys based on the value.

    i.e. if I have a file that has the following content

    amit +1
    sam -1
    chris 0.3
    Crains -0.3
    Dan +1
    Peter -1

    I need to put all the (-)ve to one reducer, and all the (+)ve to another method.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.