Apache Flume sources are used to consume events that are delivered to them by an external source like a web server and the format in which the source system sends are identified by the Apache Flume source system.

The following are the list of Apache Flume source.

  1. Avro Source
  2. Thrift Source
  3. Exec Source
  4. JMS Source
  5. Spooling Directory Source
  6. Kafka Source
  7. NetCat TCP Source
  8. NetCat UDP Source
  9. Sequence Generator Source
  10. Syslog Source
  11. HTTP Source
  12. Custom Source
  13. Scribe Source

Let us see each Apache Flume source with the following definition.


1. Avro Source

Apache Flume Avro Source receives events data from external Avro client streams. Avro souces pair with the Avro sink and creates a tiered collection topology.

Let us see the configuration example of Avro Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = avro agentone.sources.source.channels = channelone agentone.sources.source.bind = 0.0.0.0 agentone.sources.source.port = 4141


2. Thrift Source

Apache Flume Thrift Source receives events from external Thrift client streams. When a Thrift source is paired with the built-in ThriftSink on another Flume agent then it creates tiered collection topologies. We can start the Thrift source by using Kerberos authentication.

Let us see the configuration example of Thrift Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = thrift agentone.sources.source.channels = channelone agentone.sources.source.bind = 0.0.0.0 agentone.sources.source.port = 4141


3. Exec Source

Apache Flume Exec source is used to run a UNIX command on start-up and continuously generate data on standard out. In case if the process is terminated then the source also terminated and doesn't generate data.

Let us see the configuration example of Exec Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = exec agentone.sources.source.command = tail -F /var/log/secure agentone.sources.source.channels = channelone


4. JMS Source

Apache Flume JMS Source reads messages from a JMS destination such as a queue or topic. As a JMS application, it should work with any JMS provider but has only been tested with ActiveMQ.

Need to note that the vendor provided JMS jars should be included in the Flume classpath using plugins.d directory (preferred), –classpath on the command line, or via FLUME_CLASSPATH variable in flume-env.sh.

Let us see the configuration example of JMS Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = jms agentone.sources.source.channels = channelone agentone.sources.source.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactory agentone.sources.source.connectionFactory = GenericConnectionFactory agentone.sources.source.providerURL = tcp://mqserver:61616 agentone.sources.source.destinationName = BUSINESS_DATA agentone.sources.source.destinationType = QUEUE


5. Spooling Directory Source

Apache Flume Spooling Directory receives data into a “spooling” directory on disk. It keeps monitoring the directory for new data and process it.

Apache Flume Spooling Directory is a reliable source from which data does not miss even if the Flume is restarted or its process is killed.

Apache Flume will raise an error in the following conditions.

  • If a file is written before putting it in a spool directory.
  • If a file is already used and again the same file name is going to use.

We can add a timestamp to the log file name when it moves to the spool directory.

Let us see the configuration example of Spooling Directory Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = spooldir agentone.sources.source.channels = channelone agentone.sources.source.spoolDir = /var/log/apache/flumeSpool agentone.sources.source.fileHeader = true


6. Kafka Source

Apache Flume Kafka Source reads messages from Kafka topics. We can configure multiple Kafka sources in the same Consumer Group so that each will read a unique set of partitions for the topics.

The following is an example of a comma-separated topic list.

agentone.sources.source.type = org.apache.flume.source.kafka.KafkaSource agentone.sources.source.channels = channelone agentone.sources.source.batchSize = 5000 agentone.sources.source.batchDurationMillis = 2000 agentone.sources.source.kafka.bootstrap.servers = localhost:9092 agentone.sources.source.kafka.topics = test1, test2 agentone.sources.source.kafka.consumer.group.id = custom.g.id

Example for topic subscription by regex.

agentone.sources.source.type = org.apache.flume.source.kafka.KafkaSource agentone.sources.source.channels = channelone agentone.sources.source.kafka.bootstrap.servers = localhost:9092 agentone.sources.source.kafka.topics.regex = ^topic[0-9]$ # the default kafka.consumer.group.id=flume is used


7. NetCat TCP Source

Apache Flume NetCat TCP Source receives data using a port after that it converts that data into events. The data comes in the form of the newline-separated text and forwarded using the channel.

Let us see the configuration example of NetCat TCP Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = netcat agentone.sources.source.bind = 0.0.0.0 agentone.sources.source.port = 6666 agentone.sources.source.channels = channelone


8. NetCat UDP Source

Apache Flume NetCat UDP source acts as NetCat TCP that receives data on a given port and is forwarded using a channel.

Let us see an example of configuration detail for NetCat UDP Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = netcatudp agentone.sources.source.bind = 0.0.0.0 agentone.sources.source.port = 6666 agentone.sources.source.channels = channelone


9. Sequence Generator Source

Apache Flume sequence generator is used for the testing purpose that generates the event continuously based on the counter. The counter always initiated with the value 0 and keeps increasing by 1. The counter stops only when it is reached to total events.

Let us see an example of configuration detail for Sequence Generator Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = seq agentone.sources.source.channels = channelone


10. Syslog Source

Apache Flume Syslog Sources reads Syslog data and generate Flume events.


1. Syslog TCP Source

Below is the original, tried-and-true Syslog TCP source Property Name.

Let us see an example of configuration detail for Syslog TCP Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = syslogtcp agentone.sources.source.port = 5140 agentone.sources.source.host = localhost agentone.sources.source.channels = channelone


2. Multiport Syslog TCP Source

Multiport Syslog TCP Source is a newer, faster, multi-port capable version of the Syslog TCP source. It supports Multi-port capability which means it can listen on many ports at one time and to do so it uses the Apache Mina library.

It provides the capability to configure the character set used on a per-post basis.

Let us see an example of configuration detail for Multiport Syslog TCP Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = multiport_syslogtcp agentone.sources.source.channels = channelone agentone.sources.source.host = 0.0.0.0 agentone.sources.source.ports = 10001 10002 10003 agentone.sources.source.portHeader = port


3. Syslog UDP Source

Multiport Syslog TCP Source is a newer, faster, multi-port capable version of the Syslog TCP source. It supports Multi-port capability which means it can listen on many ports at one time and to do so it uses the Apache Mina library.

It provides the capability to configure the character set used on a per-post basis.

Let us see an example of configuration detail for Syslog UDP Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = syslogudp agentone.sources.source.port = 5140 agentone.sources.source.host = localhost agentone.sources.source.channels = channelone


11. HTTP Source

Apache Flume HTTP source gets data from the HTTP POST and GET. If there is an error thrown from the handler then an error status 400 is thrown and in case the channel is full then an HTTP 503 error is thrown.

Let us see an example of configuration detail for HTTP Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = http agentone.sources.source.port = 5140 agentone.sources.source.channels = channelone agentone.sources.source.handler = org.example.rest.RestHandler agentone.sources.source.handler.nickname = random props agentone.sources.source.HttpConfiguration.sendServerVersion = false agentone.sources.source.ServerConnector.idleTimeout = 300


12. Custom Source

Apache Flume custom source is a user-managed source in which use includes a custom source and its dependencies.

Let us see an example of configuration detail for Custom Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = org.example.MySource agentone.sources.source.channels = channelone


13. Scribe Source

Apache Flume Scribe source is another type of ingesting system. To adopt the existing Scribe ingesting system, Flume should use Scribe Source based on Thrift with the compatible transferring protocol.

Let us see an example of configuration detail for Scribe Source.

agentone.sources = source agentone.channels = channelone agentone.sources.source.type = org.apache.flume.source.scribe.ScribeSource agentone.sources.source.port = 1463 agentone.sources.source.workerThreads = 5 agentone.sources.source.channels = channelone