For a bare minimum flume implementation we need to have the following components
1. Client : Java Program : Generates event(s) based on the fluctuating source
2. Source : Java Program : Must extend 'AbstractSource'. It is used to interface with the client, where the source acts as a server which listens to the events generated by the client. Based on kind of behaviour expected the source can implement or extend one of the flume source's in the 'org.apache.flume.source.*' packages
OutOfTheBox sources: Avro, Exec, NetCat, Sequence Generator, Syslog, Scribe
3. Channel : Java Program : Connects the Source and the Sink. Acts as an event conduit between Source and Sink. There are multiple implementations of the Channel that can be used out-of-the-box which could be found in 'org.apache.flume.channel'. Most common one is the 'memory' channel
OutOfTheBox sources: Memory, JDBC, File
4. Sink : Java Program : Must extend 'AbstractSink'. It is used to collect the events coming out of a client and write it to file system Based on kind of behaviour expected the sink can implement or extend one of the flume sink's in the 'org.apache.flume.sink.*' packages
OutOfTheBox sources: Avro, Logger, IRC, File, HBase
## To demonstrate how flume works. Following is the simplest example ##
Want to know more about it? Head over to the flume wiki page! http://archive.cloudera.com/cdh4/cdh/4/flume-ng/FlumeUserGuide.html#configuration
Code link for custom flume components: http://goo.gl/u0M5Yj
1. Client : Java Program : Generates event(s) based on the fluctuating source
2. Source : Java Program : Must extend 'AbstractSource'. It is used to interface with the client, where the source acts as a server which listens to the events generated by the client. Based on kind of behaviour expected the source can implement or extend one of the flume source's in the 'org.apache.flume.source.*' packages
OutOfTheBox sources: Avro, Exec, NetCat, Sequence Generator, Syslog, Scribe
3. Channel : Java Program : Connects the Source and the Sink. Acts as an event conduit between Source and Sink. There are multiple implementations of the Channel that can be used out-of-the-box which could be found in 'org.apache.flume.channel'. Most common one is the 'memory' channel
OutOfTheBox sources: Memory, JDBC, File
4. Sink : Java Program : Must extend 'AbstractSink'. It is used to collect the events coming out of a client and write it to file system Based on kind of behaviour expected the sink can implement or extend one of the flume sink's in the 'org.apache.flume.sink.*' packages
OutOfTheBox sources: Avro, Logger, IRC, File, HBase
## To demonstrate how flume works. Following is the simplest example ##
##### flume-agent.conf ##### #Agent Definition myagent.sources = mysource myagent.channels = mychannel myagent.sinks = mysink #Channel Definition myagent.channels.mychannel.type = memory myagent.channels.mychannel.capactiy = 1000 myagent.channels.mychannel.transactionCapacity = 100 #Source Definition myagent.sources.mysource.type = exec myagent.sources.mysource.command = tail -F /user/shashi/Somefile.txt myagent.sources.mysource.channels = mychannel #Sink Definition myagent.sinks.mysink.type = hdfs myagent.sinks.mysink.hdfs.path = /shashi myagent.sinks.mysink.hdfs.fileType = DataStream myagent.sinks.mysink.channel = mychannelFlume command to get the agent started
##### flume command ##### #Command to start flume agent flume-ng agent --conf-file flume-agent.conf --name myagentShell script to increment the file being tailed by the flume
max=100 for i in `seq 1 $max` do echo "$i" >> /user/shashi/Somefile.txt done
Want to know more about it? Head over to the flume wiki page! http://archive.cloudera.com/cdh4/cdh/4/flume-ng/FlumeUserGuide.html#configuration
Code link for custom flume components: http://goo.gl/u0M5Yj
For a non script person, it would be better if u throw some insight on how this shell script is related to the.conf file,
ReplyDeletei.e. for a completely new beginner, he might want to know whats happening here and how is it working here
btw a good block :)