Understand different type of Sources
1. TailSource
2. Exec Source
3. Spooling Directory
4. Syslog
a. Syslog UDP Source
b. Syslog TCP sources
c. Multiport syslog TCP sources
1. Introduction to TailSource and why it is discontinuedTailSource is no longer part of Flume. Using the TailSource you can tail any file on the system and for each line it can create flume events.
In case of channels and sinks, events are added and removed from the channel, will be a part of transaction. However, when you tail the file, there is no way, that it could be part of a transaction.
Suppose, because of any reason for instance channel fails, then there is no possibility to rollback this tailed transaction, to put back the data.
Let’s have an example, if you are tailing a file
/user/hadoopexam/access.log |
And in the log4j you had done the configuration to rotate or rename the file if it reaches the 1 MB in size and renaming will be done as below.
/user/hadoopexam/access.log1 |
And assume Flume was reading a file access.log which is renamed to access.log1, however, it has file handler with it so it is still able to read it. But at the same time assume the new log file is also renamed as below
/user/hadoopexam/access.log2 |
Now, Flume is done with the access.log1, and it will start reading the file access.log and it is unaware that there is another file access.log2 was created and that log would be missed by the Apache Flume for reading.
So, you might have noticed that using the TailSource there are chances that data could be lost, that is the second reason why TailSource was discontinued after 0.9 flume release.
1. Tail cannot be a part of transaction
2. Possibility of data loss as per above example.
_______________________________________________________________________________________________________________________
Click to View What Learners Say about us : Testimonials
We have training subscriber from TCS, IBM, INFOSYS, ACCENTURE, APPLE, HEWITT, Oracle , NetApp , Capgemini etc.