[Apache Pig] Extending CSVExcelLoader to append file name of the split

CSVExcelLoader doesn't have an option to append the filename of the split it is processing. It comes in handy in certain situations. Here is a quick way to add the support

public class CSVExcelLoaderWithFileName extends CSVLoader {

  Path path;

  @Override
  public void prepareToRead(RecordReader reader, PigSplit split) throws IOException {
    super.prepareToRead(reader, split);
    path = ((FileSplit)split.getWrappedSplit()).getPath();
  }

  @Override
  public Tuple getNext() throws IOException {
    Tuple superTuple =  super.getNext();
    if(superTuple != null) {
      superTuple.append(path.getName());
    }
    return superTuple;
  }
}

The code is simple. When the Load function is ready to read, we get the path of the split it shall be processing.

Caution: Watch out forpig.splitCombination property. More info at http://pig.apache.org/docs/r0.12.1/perf.html#combine-files

2 thoughts on “[Apache Pig] Extending CSVExcelLoader to append file name of the split

  1. i use Mina 2.0.7 ,NioProcessor 100% CPU usage on Linux (epoll selector bug) is back!!!
    centos 64bit,CentOS Linux release 6.2 (Final)
    java version “1.6.0_43″
    Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

    please !help me!

  2. i use Mina 2.0.7 ,NioProcessor 100% CPU usage on Linux (epoll selector bug) is back!!!
    centos 64bit,CentOS Linux release 6.2 (Final)
    java version “1.6.0_43″
    Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

    please !help me!

    http://www.downloadandroid.org

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">