/    /  Flume – Partitioning and Interceptors

Partitioning and Interceptors:

We have studied partitions in hive, similar to that we have partitions in Flume. By the help of partitions, we can access data fastly. In Flume we can partition data by time, a process can be run to transform the completed partition. We can store data in partition by setting hdfs.path parameter to include

subdirectories that use time format escape sequences:

agent1.sinks.sink1.hdfs.path = /tmp/flume/year=%Y/month=%m/day=%d

The partition that a Flume event is written to is determined by the timestamp header on the event. Events don’t have this header by default, but it can be added using a Flume interceptor.

Interceptors can modify or drop events based on any criteria chosen by us. They are just like classes which implement org.apache.flume.interceptor.Interceptor interface. Before the events are placed in the channel Interceptors are attached to sources and run on events.

In source configuration file we supply interceptors and they are invoked in the same order in which they are specified. We also have chaining of interceptors, in this events of one interceptor are passed to next interceptor as a chain process. If interceptor wants to drop any events, then it does not return that event in the list that it returns.

Example configuration of interceptor are,

a1.sources = r1

a1.sinks = k1

a1.channels = c1

a1.sources.r1.interceptors = i1 i2

a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Builder

a1.sources.r1.interceptors.i1.preserveExisting = false

a1.sources.r1.interceptors.i1.hostHeader = hostname

a1.sources.r1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

a1.sinks.k1.filePrefix = FlumeData.%{CollectorHost}.%Y-%m-%d

a1.sinks.k1.channel = c1

(OR)

a1.sources.r1.interceptors = t1

a1.sources.r1.interceptors.t1.type = regex_extractor

a1.sources.r1.interceptors.t1.regex = ^(\\d)

Some of the examples of interceptors are TimestampInterceptor, HostInterceptor, StaticInterceptor, RegexInterceptor, etc.