/    /  Flume – Configuring flume agents

Configuring flume agents between two different nodes:

Here we are using 2 different nodes in which we have Hadoop and flume installed already and are up and running.

The data which we are using is,

sridhar@ubuntu:~$ cat ddd.txt






First let us configure flume configuration file on Node-1.

sridhar@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume.conf.source

#Source agent configuration

agent.sources = source1

agent.channels = memoryChannel

agent.sinks = AvroSink

#source configuration

agent.sources. source1.type = exec

agent.sources. source1.command = cat /home/sridhar/ddd.txt

agent.sources. source1.channels = memoryChannel

#Channel configuration

agent.channels.memoryChannel.type = memory

agent.channels.memoryChannel.capacity = 10000

agent.channels.memoryChannel.transactionCapactiy = 50

#Sink configuration

agent.sinks.AvroSink.type = avro

agent.sinks.AvroSink.hostname =

agent.sinks.AvroSink.port = 10005

agent.sinks.AvroSink.channel = memoryChannel

Here “” is of Node-2 IP i.e sink IP address.

Now let us configure flume configuration file on Node-2.

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume.conf.sink

#Sink agent configuration

agent2.sources = source2

agent2.channels = memoryChannel-1

agent2.sinks = HadoopOut

#source configuration

agent2.sources. source2.type = avro

agent2.sources. source2.bind =

#Node-1 sink and Node-2 source should be of same avro type and its IP as we are connecting both of them.

agent2.sources. source2.port = 10005

agent2.sources. source2.channels = memoryChannel-1

#Channel configuration

agent2.channels.memoryChannel-1.type = memory

agent2.channels.memoryChannel-1.capacity = 10000

agent2.channels.memoryChannel-1.transcationCapacity = 50

#Sink configuration

agent2.sinks.HadoopOut.channel = memoryChannel-1

agent2.sinks.HadoopOut.type = hdfs

agent2.sinks.HadoopOut.hdfs.path =  hdfs://localhost:9000/apache1_flume_data

agent2.sinks.HadoopOut.hdfs.fileType = DataStream

agent2.sinks.HadoopOut.hdfs.rollSize = 0

agent2.sinks.HadoopOut.hdfs.rollInterval = 60

agent2.sinks.HadoopOut.hdfs.rollCount = 0

agent2.sinks.HadoopOut.hdfs.batchSize = 50

Note-Now run configuration file of Node-2(Sink side) first then configuration file of Node-1(Source side).

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf.sink -n agent2
sridhar@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf.source -n agent

We can see the output in HDFS as,

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ hdfs dfs -ls /apache1_flume_data

Found 1 items

-rw-r--r--   1 hdadmin supergroup        57 2017-12-27 03:09 /apache1_flume_data/FlumeData.1514372953380.tmp

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ hdfs dfs -cat /apache1_flume_data/FlumeData.1514372953380.tmp






In this way we can get the data from other node using “AVRO” source which is used for agent-to-agent communication.