External events are send from Avro client to Avro source and Avro source listens to it based on port number. Required properties for Avro source are channel, type (need to be Avro), bind (hostname or IP address) and port. Example of property file for Avro source is,
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 4141
Similar to Avro source we have Thrift source which listens on Thrift port.
It runs a given Unix command on start-up and expects that process to continuously produce data on standard out. Unix commands can be like cat, tail, date, etc. If the process such as date exits, then the source will also exit and produce no more data.
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = exec a1.sources.r1.command = tail -F /var/log/secure a1.sources.r1.channels = c1
Spooling directory source:
It lets us to keep data by placing files into a spooling directory on disk. The source watches the directory for any new files and will parse events out of the new files as they appear. When the file is fully read into the channel then it is renamed to indicate that it is completed.
The major difference between this and Exec source is that spooling directory source is reliable which means it will not miss any data even if Flume is killed or restarted.
Some of the conditions when it fails are,
1.If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
2.If a file name is reused at a later time, Flume will print an error to its log file and stop processing.
To avoid the above issues, it may be useful to add a unique identifier (such as a timestamp) to log file names when they are moved into the spooling directory.
a1.channels = ch-1 a1.sources = src-1 a1.sources.src-1.type = spooldir a1.sources.src-1.channels = ch-1 a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool a1.sources.src-1.fileHeader = true
Netcap TCP source:
It listens the data based on port, and changes each lint of text in data into an event. Similar to this we have Netcap UDP source also.
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = netcat a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 6666 a1.sources.r1.channels = c1
Other types of sources are Syslog, Sequence generator, HTTP, Legacy, etc.