This is multipart blog series on Apache Flume to share what I learn playing with it.
In this post we shall look at a quick single node setup. This is a modified version of single node setup from Flume Documentation.
Download Flume 1.3.3 binary
Setting up a single Node installation
To setup Flume, we need following things
- Source - which shall get data into Flume
- Channel - to connect between source and sink on the same agent
- Sink - to write data in destination or to other source in Flume topology. In single node installation it would be to the disk.
Extract the binary in a folder. We shall refer this as FLUME_HOME
Create a file single-node-demo.properties in FLUME_HOME/conf
Add following content to it.
# Exampple for single node config # Base config a1.sources=src1 a1.sinks=sink1 a1.channels=ch1 # Configure the source a1.sources.src1.type=netcat a1.sources.src1.bind=localhost a1.sources.src1.port=51000 # Sink Configuration a1.sinks.sink1.type=logger # Channel configuration a1.channels.ch1.type=memory a1.channels.ch1.capacity=1000 a1.channels.ch1.transactionCapacity=100 # Link stuff together a1.sources.src1.channels=ch1 a1.sinks.sink1.channel=ch1
Let's look at the configuration.
The basic configuration needs to be defined is the type of Source, Channel and Sink to use.
In the configuration above we define a Source, a Channel and a Sink. We shall see their properties in a moment.
We shall use netcat as a source to push data into Flume. See the config file above on how to define a netcat source, which shall bind on localhost and port 51000 to listen for incoming messages.
We shall use a file sink to write data to the disk. The above configuration defines a logger sink, which is based on Log4j and the it shall pick properties from log4j.properties file in the classpath.
We shall use a Memory channel to connect Source and Sink. The configuration defines a Memory channel or a Queue of max capacity as 1000.
Linking Source to Sink
Now lets link Source, Channel and Sink together. The configuration above ties source to channel ch1 and then add a Sink to the Channel ch1. (As defined in last 2 lines in the configuration file above)
The configuration part is done, now let's start the Agent.
Starting the Agent
$bin/flume-ng agent --conf-file conf/single-node-demo.properties --name a1 --conf ./conf/ -Dflume.root.logger=INFO,console
This shall start a Flume Agent with name "a1". The name specified needs to match to name we placed in config file, in this case it's a1. The last logger property is need, as the default log4j.properties file shipped with flume uses a system property to set Flume log level. However you can specify you own configuration file.
Sending messages to Flume
We have configured the netcat as a source. So it would be listening on configured port for messages. We can telnet and send messages to flume. From a terminal, open a telent session on port 51000
$ telnet localhost 51000
Now you can start typing the message, once you press enter, you can locate the same message in Flume logs.
This is it for our first post.
You can get the Config file from git