[Learning Spark with Examples] File Copy

This post is first in series of Apache Spark examples which I shall use to learn more about Spark. This is a simple File copy example, in which we shall read a source file into RDD and save the RDD without any transformation.

The code can be found at here

For complete project refer https://github.com/paliwalashish/learning-spark

Lets look at the code

public class FileCopy {

  public static void main(String[] args) {
    SparkConf sparkConf = new SparkConf().setAppName("File Copy");
    JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);

    // Read the source file
    JavaRDD<String> input = sparkContext.textFile(args[0]);

    // Save the file to specified location
    input.saveAsTextFile(args[1]);
  }

}
  • First, we create an instance of SparkConf and set the application name
  • We create a JavaSparkContext, passing Spark Conf
  • Read the file into a JavaRDD. For simplicity, we have picked a simple text file
  • Save the RDD to a desired location

Let's compile the program

$mvn clean package

Once the build is successful, run the program as follows (run from where pom.xml is present)

$~/cots/spark-1.2.0-bin-hadoop2.4/bin/spark-submit --class org.learningspark.simple.FileCopy --master local[1] target/learningspark-1.0-SNAPSHOT.jar /Users/ashishpaliwal/open-spource/flume/trunk/CHANGELOG /Users/ashishpaliwal/open-spource/CHANGELOG

This shall run the program on local machine. The last 2 params are the arguments passed to program (input file and output location).

Leave a Reply

Your email address will not be published. Required fields are marked *