Fan out:
Fan out is the process of delivering events form one source to multiple sinks through multiple channels. We have 2 modes for fan out, they are replicating and multiplexing. In the replicating flow, the event is sent to all the configured channels. In the multiplexing flow, the event is sent to only a subset of channels.
To configure fan out we should add a channel “selector” that can be replicating or multiplexing. By default, the selector is replicating.
Here in the below example we have delivered events to both HDFS sink and logger sink through 2 channels.
hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin/conf$ cat flume_fanout.conf agent1.sources = source1 agent1.sinks = sink1a sink1b agent1.channels = channel1a channel1b agent1.sources.source1.channels = channel1a channel1b agent1.sinks.sink1a.channel = channel1a agent1.sinks.sink1b.channel = channel1b agent1.sources.source1.type = spooldir agent1.sources.source1.spoolDir = /home/hdadmin/flume1/spooldir agent1.sinks.sink1a.type = hdfs agent1.sinks.sink1a.hdfs.path = hdfs://localhost:9000/flume_fanout agent1.sinks.sink1a.hdfs.filePrefix = events agent1.sinks.sink1a.hdfs.fileSuffix = .log agent1.sinks.sink1a.hdfs.fileType = DataStream agent1.sinks.sink1b.type = logger agent1.channels.channel1a.type = file agent1.channels.channel1b.type = memory hdadmin@ubuntu:~/apache-flume-1.5.0-cdh5.3.2-bin$ bin/flume-ng agent --conf ./conf/ -f conf/flume_fanout.conf -n agent1 hdadmin@ubuntu:~/flume1/spooldir$ hdfs dfs -cat /flume_fanout/events.1514377786420.log (1,2,3) (4,5,6) (2,3,4) (6,3,7) (5,3,7) (6,2,8) (4,2,5) (2,4,6)
Here Flume uses 2 separate transactions to deliver batch events from the spooling directory source to each channel. One transaction is used to channel feeding the HDFS sink, and other is for channel feeding the logger sink. Both transactions have same batch of events delivered. Suppose If any one of these transactions fails like if a channel is full, then the events will not be removed from the
source they will be retried later. If we don’t care about some events not delivered to the sink then we can use “optional” parameter on the source like,
agent1.sources.source1.selector.optional = channel1b
We can set some of the properties of selector as,
# channel selector configuration Agent1.sources.avro-AppSrv-source1.selector.type = multiplexing Agent1.sources.avro-AppSrv-source1.selector.header = Country Agent1.sources.avro-AppSrv-source1.selector.mapping.INDIA = mem-channel-1 Agent1.sources.avro-AppSrv-source1.selector.mapping.AFRICA = file-channel-2 Agent1.sources.avro-AppSrv-source1.selector.mapping.AUSTRALIA = mem-channel-1 file-channel-2 Agent1.sources.avro-AppSrv-source1.selector.optional.INDIA = mem-channel-1 file-channel-2 Agent1.sources.avro-AppSrv-source1.selector.default = mem-channel-1