Configuring flume agents between two different nodes:
Here we are using 2 different nodes in which we have Hadoop and flume installed already and are up and running.
The data which we are using is,
sridhar@ubuntu:~$ cat ddd.txt 3,'sai',78 4,'ddd',87 5,'hh',973 4,'ddd',87 5,'hh',973
First let us configure flume configuration file on Node-1.
sridhar@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume.conf.source #Source agent configuration agent.sources = source1 agent.channels = memoryChannel agent.sinks = AvroSink #source configuration agent.sources. source1.type = exec agent.sources. source1.command = cat /home/sridhar/ddd.txt agent.sources. source1.channels = memoryChannel #Channel configuration agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 10000 agent.channels.memoryChannel.transactionCapactiy = 50 #Sink configuration agent.sinks.AvroSink.type = avro agent.sinks.AvroSink.hostname = 192.168.30.130 agent.sinks.AvroSink.port = 10005 agent.sinks.AvroSink.channel = memoryChannel
Here “192.168.30.130” is of Node-2 IP i.e sink IP address.
Now let us configure flume configuration file on Node-2.
hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume.conf.sink #Sink agent configuration agent2.sources = source2 agent2.channels = memoryChannel-1 agent2.sinks = HadoopOut #source configuration agent2.sources. source2.type = avro agent2.sources. source2.bind = 192.168.30.130 #Node-1 sink and Node-2 source should be of same avro type and its IP as we are connecting both of them. agent2.sources. source2.port = 10005 agent2.sources. source2.channels = memoryChannel-1 #Channel configuration agent2.channels.memoryChannel-1.type = memory agent2.channels.memoryChannel-1.capacity = 10000 agent2.channels.memoryChannel-1.transcationCapacity = 50 #Sink configuration agent2.sinks.HadoopOut.channel = memoryChannel-1 agent2.sinks.HadoopOut.type = hdfs agent2.sinks.HadoopOut.hdfs.path = hdfs://localhost:9000/apache1_flume_data agent2.sinks.HadoopOut.hdfs.fileType = DataStream agent2.sinks.HadoopOut.hdfs.rollSize = 0 agent2.sinks.HadoopOut.hdfs.rollInterval = 60 agent2.sinks.HadoopOut.hdfs.rollCount = 0 agent2.sinks.HadoopOut.hdfs.batchSize = 50
Note-Now run configuration file of Node-2(Sink side) first then configuration file of Node-1(Source side).
hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf.sink -n agent2
sridhar@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume.conf.source -n agent
We can see the output in HDFS as,
hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ hdfs dfs -ls /apache1_flume_data Found 1 items -rw-r--r-- 1 hdadmin supergroup 57 2017-12-27 03:09 /apache1_flume_data/FlumeData.1514372953380.tmp hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ hdfs dfs -cat /apache1_flume_data/FlumeData.1514372953380.tmp 3,'sai',78 4,'ddd',87 5,'hh',973 4,'ddd',87 5,'hh',973
In this way we can get the data from other node using “AVRO” source which is used for agent-to-agent communication.