Apache Whirr uses defaults while launching Data Node and Task Tracker processes, however they can be customised in the recipe based on the needs. Say we need to allocate larger heap size for Data Node/Task Tracker, add the following line to the recipe file hadoop-env.HADOOP_HEAPSIZE=2048
Author: ashish
[Apache Pig] Extending CSVExcelLoader to append file name of the split
CSVExcelLoader doesn’t have an option to append the filename of the split it is processing. It comes in handy in certain situations. Here is a quick way to add the support
[Apache Pig] Dealing with Unexpected character ‘$’ error
Recently while working on an ETL script, encountered following exception when using REPLACE() API to remove $ from column. 2014-04-08 21:24:41,454 [main] ERROR org.apache.pig.PigServer – exception during parsing: Error during parsing. <file test.pig, line 16, column 51> Unexpected character '$' Failed to parse: <file test.pig, line 16, column 51> Unexpected character '$' at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:243) at […]
[Apache Pig] Dealing with java.lang.OutOfMemoryError: Java heap space
Recently while working on an aggregation script, faced following exception java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) There was no custom UDF, and all the script was using built-in functions. This exception can be dealt by pushing up the Java heap (defaults to 1000m). We can either set […]
[ElasticSearch] Setting Kibana 3 on OSX in 2 minutes
Kibana lets you interact with ElasticSearch easily. Let’s see how can we setup Kibana in 2 minutes on OSX Download Kiaban Extract the zip of tar Go to the extracted folder and execute following command python -m SimpleHTTPServer 8000 8000 is the port, can be port of your choice. Now point you browser to http://localhost:8000/ […]
[Flume Cookbook] Available Sinks
Flume ships with loads of Sink Implementations. Here is the list HDFS Sink Logger Sink Avro Sink Thrift Sink IRC Sink File Roll Sink Null Sink HBase Sink Morphline Solr Sink Let’s look at them briefly HDFS Sink Writes Events to HDFS Logger Sink Writes Events using Logger at INFO level Avro Sink Writes Events […]