/    /  Flume – Configuring flume agents

Configuring flume agents between two different nodes:

Here we are using 2 different nodes in which we have Hadoop and flume installed already and are up and running.

The data which we are using is,

sridhar@ubuntu:~$ cat ddd.txt

3,'sai',78

4,'ddd',87

5,'hh',973

4,'ddd',87

5,'hh',973

First let us configure flume configuration file on Node-1.

sridhar@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume.conf.source

#Source agent configuration

agent.sources = source1

agent.channels = memoryChannel

agent.sinks = AvroSink



#source configuration



agent.sources. source1.type = exec

agent.sources. source1.command = cat /home/sridhar/ddd.txt

agent.sources. source1.channels = memoryChannel



#Channel configuration



agent.channels.memoryChannel.type = memory

agent.channels.memoryChannel.capacity = 10000

agent.channels.memoryChannel.transactionCapactiy = 50



#Sink configuration



agent.sinks.AvroSink.type = avro

agent.sinks.AvroSink.hostname = 192.168.30.130

agent.sinks.AvroSink.port = 10005

agent.sinks.AvroSink.channel = memoryChannel

Here “192.168.30.130” is of Node-2 IP i.e sink IP address.

Now let us configure flume configuration file on Node-2.

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume.conf.sink

#Sink agent configuration

agent2.sources = source2

agent2.channels = memoryChannel-1

agent2.sinks = HadoopOut



#source configuration



agent2.sources. source2.type = avro

agent2.sources. source2.bind = 192.168.30.130



#Node-1 sink and Node-2 source should be of same avro type and its IP as we are connecting both of them.



agent2.sources. source2.port = 10005

agent2.sources. source2.channels = memoryChannel-1



#Channel configuration



agent2.channels.memoryChannel-1.type = memory

agent2.channels.memoryChannel-1.capacity = 10000

agent2.channels.memoryChannel-1.transcationCapacity = 50



#Sink configuration



agent2.sinks.HadoopOut.channel = memoryChannel-1

agent2.sinks.HadoopOut.type = hdfs

agent2.sinks.HadoopOut.hdfs.path =  hdfs://localhost:9000/apache1_flume_data

agent2.sinks.HadoopOut.hdfs.fileType = DataStream

agent2.sinks.HadoopOut.hdfs.rollSize = 0

agent2.sinks.HadoopOut.hdfs.rollInterval = 60

agent2.sinks.HadoopOut.hdfs.rollCount = 0

agent2.sinks.HadoopOut.hdfs.batchSize = 50

Note-Now run configuration file of Node-2(Sink side) first then configuration file of Node-1(Source side).

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf.sink -n agent2
sridhar@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf.source -n agent

We can see the output in HDFS as,

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ hdfs dfs -ls /apache1_flume_data

Found 1 items

-rw-r--r--   1 hdadmin supergroup        57 2017-12-27 03:09 /apache1_flume_data/FlumeData.1514372953380.tmp



hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ hdfs dfs -cat /apache1_flume_data/FlumeData.1514372953380.tmp

3,'sai',78

4,'ddd',87

5,'hh',973

4,'ddd',87

5,'hh',973

In this way we can get the data from other node using “AVRO” source which is used for agent-to-agent communication.