Wednesday, October 23, 2013

Flume-ng, Hello World Quickstarter Guide!

For a bare minimum flume implementation we need to have the following components
1. Client : Java Program : Generates event(s) based on the fluctuating source

2. Source : Java Program : Must extend 'AbstractSource'. It is used to interface with the client, where the source acts as a server which listens to the events generated by the client. Based on kind of behaviour expected the source can implement or extend one of the flume source's in the 'org.apache.flume.source.*' packages
 OutOfTheBox sources: Avro, Exec, NetCat, Sequence Generator, Syslog, Scribe

3. Channel : Java Program : Connects the Source and the Sink. Acts as an event conduit between Source and Sink. There are multiple implementations of the Channel that can be used out-of-the-box which could be found in 'org.apache.flume.channel'. Most common one is the 'memory' channel
 OutOfTheBox sources: Memory, JDBC, File

4. Sink : Java Program : Must extend 'AbstractSink'. It is used to collect the events coming out of a client and write it to file system Based on kind of behaviour expected the sink can implement or extend one of the flume sink's in the 'org.apache.flume.sink.*' packages
 OutOfTheBox sources: Avro, Logger, IRC, File, HBase

## To demonstrate how flume works. Following is the simplest example ##
##### flume-agent.conf #####
#Agent Definition
myagent.sources = mysource
myagent.channels = mychannel
myagent.sinks = mysink

#Channel Definition
myagent.channels.mychannel.type = memory
myagent.channels.mychannel.capactiy = 1000
myagent.channels.mychannel.transactionCapacity = 100

#Source Definition
myagent.sources.mysource.type = exec
myagent.sources.mysource.command = tail -F /user/shashi/Somefile.txt
myagent.sources.mysource.channels = mychannel

#Sink Definition
myagent.sinks.mysink.type = hdfs
myagent.sinks.mysink.hdfs.path =  /shashi
myagent.sinks.mysink.hdfs.fileType = DataStream
myagent.sinks.mysink.channel = mychannel
Flume command to get the agent started
##### flume command #####
#Command to start flume agent
flume-ng agent --conf-file flume-agent.conf --name myagent

Shell script to increment the file being tailed by the flume
max=100
for i in `seq 1 $max`
do
    echo "$i" >> /user/shashi/Somefile.txt
done

Want to know more about it? Head over to the flume wiki page! http://archive.cloudera.com/cdh4/cdh/4/flume-ng/FlumeUserGuide.html#configuration

Code link for custom flume components: http://goo.gl/u0M5Yj

1 comment:

  1. For a non script person, it would be better if u throw some insight on how this shell script is related to the.conf file,

    i.e. for a completely new beginner, he might want to know whats happening here and how is it working here

    btw a good block :)

    ReplyDelete