/    /  Flume – Fan out

Fan out:

Fan out is the process of delivering events form one source to multiple sinks through multiple channels. We have 2 modes for fan out, they are replicating and multiplexing. In the replicating flow, the event is sent to all the configured channels. In the multiplexing flow, the event is sent to only a subset of channels.

To configure fan out we should add a channel “selector” that can be replicating or multiplexing. By default, the selector is replicating.

Here in the below example we have delivered events to both HDFS sink and logger sink through 2 channels.

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume_fanout.conf

agent1.sources = source1

agent1.sinks = sink1a sink1b

agent1.channels = channel1a channel1b

agent1.sources.source1.channels = channel1a channel1b

agent1.sinks.sink1a.channel = channel1a

agent1.sinks.sink1b.channel = channel1b

agent1.sources.source1.type = spooldir

agent1.sources.source1.spoolDir = /home/hdadmin/flume1/spooldir

agent1.sinks.sink1a.type = hdfs

agent1.sinks.sink1a.hdfs.path = hdfs://localhost:9000/flume_fanout

agent1.sinks.sink1a.hdfs.filePrefix = events

agent1.sinks.sink1a.hdfs.fileSuffix = .log

agent1.sinks.sink1a.hdfs.fileType = DataStream

agent1.sinks.sink1b.type = logger

agent1.channels.channel1a.type = file

agent1.channels.channel1b.type = memory

hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume_fanout.conf -n agent1

hdadmin@ubuntu:~/flume1/spooldir$ hdfs dfs -cat /flume_fanout/events.1514377786420.log

 (1,2,3) (4,5,6)

(2,3,4) (6,3,7)

(5,3,7) (6,2,8)

(4,2,5) (2,4,6)

Here Flume uses 2 separate transactions to deliver batch events from the spooling directory source to each channel. One transaction is used to channel feeding the HDFS sink, and other is for channel feeding the logger sink. Both transactions have same batch of events delivered. Suppose If any one of these transactions fails like if a channel is full, then the events will not be removed from the

source they will be retried later. If we don’t care about some events not delivered to the sink then we can use “optional” parameter on the source like,

agent1.sources.source1.selector.optional = channel1b

We can set some of the properties of selector as,

# channel selector configuration

Agent1.sources.avro-AppSrv-source1.selector.type = multiplexing

Agent1.sources.avro-AppSrv-source1.selector.header = Country

Agent1.sources.avro-AppSrv-source1.selector.mapping.INDIA = mem-channel-1

Agent1.sources.avro-AppSrv-source1.selector.mapping.AFRICA = file-channel-2

Agent1.sources.avro-AppSrv-source1.selector.mapping.AUSTRALIA = mem-channel-1 file-channel-2

Agent1.sources.avro-AppSrv-source1.selector.optional.INDIA = mem-channel-1 file-channel-2

Agent1.sources.avro-AppSrv-source1.selector.default = mem-channel-1