[Avro] Getting Started

This is a modified version of Getting started with Avro, simplified for a quick start.

We are going to use generated Schema files for the post and are going to use Maven project for this.

The steps to get started with Avro is as follows

  • Preliminaries - setting up dependencies
  • Defining Avro Schema
  • Compiling Schema
  • Using the Avro generating class for populating Data
  • Serializing and De-serializing

Some preliminaries

1. Add avro dependency to pom.xml

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro</artifactId>
  <version>1.7.4</version>
</dependency>

2. Add avro Schema compiler to pom.xml

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.7.4</version>
  <executions>
    <execution>
      <phase>generate-sources</phase>
      <goals>
        <goal>schema</goal>
      </goals>
      <configuration>
        <sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory>
        <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <configuration>
    <source>1.6</source>
    <target>1.6</target>
  </configuration>
</plugin>

Schema Definition

Now lets define the schema, save the snipper in users.avsc

{
    "namespace": "com.ashishpaliwal.codekatta.avro",
    "type": "record",
    "name": "User",
    "fields": [
                {"name": "name", "type": "string"},
                {"name": "favorite_number",  "type": ["int", "null"]},
                {"name": "favorite_color", "type": ["string", "null"]}
              ]
}

Compiling the Schema and using the class

Once you execute "mvn clean comppile" the class shall be generated.

We can use the class in following two ways. First is using simple POJO way and second is using the builder. Both samples are given below.

User user = new User();
user.setName("Dummy User");
user.setFavoriteColor("Black");

User user1 = User.newBuilder().setName("user2").setFavoriteColor("Blue").
  setFavoriteNumber(21).build();

Serializing

Now lets use Avro to Serialize the User Info

File usersOnDisk = new File("userOnDisk.avro");
DatumWriter<User> datumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> fileWriter = new DataFileWriter<User>(datumWriter);
fileWriter.create(user.getSchema(), usersOnDisk);
fileWriter.append(user);
fileWriter.append(user1);
fileWriter.close();

The code is fairly simple. You create the file to be used to store the binary data. Next is to create a DatumWriter. Since we know the type of objects we are going to write, we use specific DatumWriter. Next is to pass the DatumWriter to the DataFileWriter. Simply keep on adding the User instances to writer and once done close the writer. We have serialized the objects that we had into avro format.

Reading the avro file

Now lets read the data back from the file

DatumReader<User> datumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> fileReader = new DataFileReader<User>(usersOnDisk, datumReader);
User user2 = null;
while(fileReader.hasNext()) {
  user2 = fileReader.next();
  System.out.println(user2);
}

To read the file back we create a DatumReader, and pass it to DataFileReader, along with avro file to be read. We iterate on the reader and get the serialized data back into User object.

This is pretty much it. There is another way to use Avro without generating the Schema objects. Please refer this link for further details.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.