/    /  Flume – source

Flume-source:

Avro source:

External events are send from Avro client to Avro source and Avro source listens to it based on port number. Required properties for Avro source are channel, type (need to be Avro), bind (hostname or IP address) and port. Example of property file for Avro source is,

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

Thrift source:

Similar to Avro source we have Thrift source which listens on Thrift port.

Exec source:

It runs a given Unix command on start-up and expects that process to continuously produce data on standard out. Unix commands can be like cat, tail, date, etc. If the process such as date exits, then the source will also exit and produce no more data.

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /var/log/secure

a1.sources.r1.channels = c1

Spooling directory source:

It lets us to keep data by placing files into a spooling directory on disk. The source watches the directory for any new files and will parse events out of the new files as they appear. When the file is fully read into the channel then it is renamed to indicate that it is completed.

The major difference between this and Exec source is that spooling directory source is reliable which means it will not miss any data even if Flume is killed or restarted.

Some of the conditions when it fails are,

1.If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.

2.If a file name is reused at a later time, Flume will print an error to its log file and stop processing.

To avoid the above issues, it may be useful to add a unique identifier (such as a timestamp) to log file names when they are moved into the spooling directory.

a1.channels = ch-1

a1.sources = src-1



a1.sources.src-1.type = spooldir

a1.sources.src-1.channels = ch-1

a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool

a1.sources.src-1.fileHeader = true

Netcap TCP source:

It listens the data based on port, and changes each lint of text in data into an event. Similar to this we have Netcap UDP source also.

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = netcat

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 6666

a1.sources.r1.channels = c1

Other types of sources are Syslog, Sequence generator, HTTP, Legacy, etc.