[Apache Pig] Dealing with java.lang.OutOfMemoryError: Java heap space

Recently while working on an aggregation script, faced following exception

java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

There was no custom UDF, and all the script was using built-in functions. This exception can be dealt by pushing up the Java heap (defaults to 1000m).

We can either set the variable PIG_HEAPSIZE to desired size as environment variable like

$export PIG_HEAPSIZE=2096

or by using following while starting pig script

$PIG_HEAPSIZE=2096 pig -x local run-aggregations.pig

NOTE: The size has be in MB only

One thought on “[Apache Pig] Dealing with java.lang.OutOfMemoryError: Java heap space

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.