[Flume Cookbook] Playing with Apache Flume

This is multipart blog series on Apache Flume to share what I learn playing with it.

In this post we shall look at a quick single node setup. This is a modified version of single node setup from Flume Documentation.

Pre-requisite

Download Flume 1.3.3 binary

Setting up a single Node installation

To setup Flume, we need following things

  • Source - which shall get data into Flume
  • Channel - to connect between source and sink on the same agent
  • Sink - to write data in destination or to other source in Flume topology. In single node installation it would be to the disk.

Configuration

Extract the binary in a folder. We shall refer this as FLUME_HOME
Create a file single-node-demo.properties in FLUME_HOME/conf
Add following content to it.

# Exampple for single node config

# Base config
a1.sources=src1
a1.sinks=sink1
a1.channels=ch1

# Configure the source
a1.sources.src1.type=netcat
a1.sources.src1.bind=localhost
a1.sources.src1.port=51000

# Sink Configuration
a1.sinks.sink1.type=logger

# Channel configuration
a1.channels.ch1.type=memory
a1.channels.ch1.capacity=1000
a1.channels.ch1.transactionCapacity=100

# Link stuff together
a1.sources.src1.channels=ch1
a1.sinks.sink1.channel=ch1

Let's look at the configuration.

Base Configuration

The basic configuration needs to be defined is the type of Source, Channel and Sink to use.
In the configuration above we define a Source, a Channel and a Sink. We shall see their properties in a moment.

Source Configuration

We shall use netcat as a source to push data into Flume. See the config file above on how to define a netcat source, which shall bind on localhost and port 51000 to listen for incoming messages.

Sink Configuration

We shall use a file sink to write data to the disk. The above configuration defines a logger sink, which is based on Log4j and the it shall pick properties from log4j.properties file in the classpath.

Channel Configuration

We shall use a Memory channel to connect Source and Sink. The configuration defines a Memory channel or a Queue of max capacity as 1000.

Linking Source to Sink

Now lets link Source, Channel and Sink together. The configuration above ties source to channel ch1 and then add a Sink to the Channel ch1. (As defined in last 2 lines in the configuration file above)

The configuration part is done, now let's start the Agent.

Starting the Agent

$bin/flume-ng agent --conf-file conf/single-node-demo.properties --name a1 --conf ./conf/ -Dflume.root.logger=INFO,console

This shall start a Flume Agent with name "a1". The name specified needs to match to name we placed in config file, in this case it's a1. The last logger property is need, as the default log4j.properties file shipped with flume uses a system property to set Flume log level. However you can specify you own configuration file.

Sending messages to Flume

We have configured the netcat as a source. So it would be listening on configured port for messages. We can telnet and send messages to flume. From a terminal, open a telent session on port 51000

$ telnet localhost 51000

Now you can start typing the message, once you press enter, you can locate the same message in Flume logs.

This is it for our first post.

You can get the Config file from git

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.