Apache Flume Sink is the component of the Flume agent that is used to consume the data from the channel and send them to the destination system. Flume Sink is associated with a unique name that is used to bifurcate the configuration and working namespaces.

Apache Flume provides different types of the sink as mentioned below.

  1. HDFS Sink
  2. Hive Sink
  3. Logger Sink
  4. Avro Sink
  5. Thrift Sink
  6. IRC Sink
  7. File Roll Sink
  8. Null Sink
  9. HBase Sink
  10. MorphlineSolr Sink
  11. ElasticSearch Sink
  12. Kite Dataset Sink
  13. HTTP Sink
  14. Custom Sink

Let us see each Flume Sink in detail.


1. HDFS Sink

Apache Flume HDFS sink is used to move events from the channel to the Hadoop distributed file system. It also supports text and sequence-based files.

If we are using Apache Flume HDFS Sink in that case Apache Hadoop should be installed so that Flume can communicate with the Hadoop cluster using Hadoop JARs.

We need to note here that the version of Hadoop that supports the sync() call is a must.

Example for the Apache Flume HDFS Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = hdfs agentone.sinks.k1.channel = channelone agentone.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S agentone.sinks.k1.hdfs.filePrefix = events- agentone.sinks.k1.hdfs.round = true agentone.sinks.k1.hdfs.roundValue = 10 agentone.sinks.k1.hdfs.roundUnit = minute

2. Hive Sink

Apache Flume Hive Sink streams delimited text or JSON events directly into a Hive table or partition. Once events are committed to the Hive table post that it becomes visible to Hive queries.

Apache Hive partitions can be pre-created or Flume will create them before streaming data and the fields for stream data will be mapped with matching columns in the Hive table.

Example for Apache Hive table.

create table web_data ( id int , msg string ) partitioned by (continent_data string, country_code string, time string) clustered by (id) into 5 buckets stored as orc;

Example for Apache Flume Hive Sink.

agentone.channels = channelone agentone.channels.channelone.type = memory agentone.sinks = k1 agentone.sinks.k1.type = hive agentone.sinks.k1.channel = channelone agentone.sinks.k1.hive.metastore = thrift://127.0.0.1:9083 agentone.sinks.k1.hive.database = logsdb agentone.sinks.k1.hive.table = weblogs agentone.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M agentone.sinks.k1.useLocalTimeStamp = false agentone.sinks.k1.round = true agentone.sinks.k1.roundValue = 10 agentone.sinks.k1.roundUnit = minute agentone.sinks.k1.serializer = DELIMITED agentone.sinks.k1.serializer.delimiter = "\t" agentone.sinks.k1.serializer.serdeSeparator = '\t' agentone.sinks.k1.serializer.fieldnames =id,,msg

3. Logger Sink

Apache Flume Logger Sink is basically used for testing/debugging purpose and will not require any extra configuration.

Example for Apache Flume Logger Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = logger agentone.sinks.k1.channel = channelone


4. Avro Sink

Apache Flume Avro sink gets the stream in the form of batches and sent them to the configured hostname/port pair.

Example for Apache Flume Avro Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = avro agentone.sinks.k1.channel = channelone agentone.sinks.k1.hostname = 10.10.10.10 agentone.sinks.k1.port = 4545


5. Thrift Sink

Apache Flume Thrift Sink gets the flume events in batches from the channel and converted them into Thrift events and sent to the configured hostname/port pair.

Example for Apache Flume Thrift Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = thrift agentone.sinks.k1.channel = channelone agentone.sinks.k1.hostname = 10.10.10.10 agentone.sinks.k1.port = 4545


6. IRC Sink

Apache Flume TIRC Sink gets the messages from the channel and gives them to configured IRC destination systems.

Example for Apache Flume IRC Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = irc agentone.sinks.k1.channel = channelone agentone.sinks.k1.hostname = irc.yourdomain.com agentone.sinks.k1.nick = flume agentone.sinks.k1.chan = #flume


7. File Roll Sink

Apache Flume File Roll Sink gets the steams from the channel and stores events on the local file system.

Example for Apache Flume File Roll Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = file_roll agentone.sinks.k1.channel = channelone agentone.sinks.k1.sink.directory = /var/log/flume


8. Null Sink

Apache Flume Null Sink cancels all events that are received from the channel.

Example for Apache Flume Null Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = null agentone.sinks.k1.channel = channelone


9. HBase Sink

Apache Flume HBase sink check the configuration file of the hbase-site.xml and then write data into the HBase database.

The following two types of serializers are provided with Apache Flume for convenience.

  • SimpleHbaseEventSerializer: This serializer is primarily an example implementation that is used to write the event body as-is to HBase, and optionally increments a column in Hbase.
  • RegexHbaseEventSerializer: This serializer breaks the event body based on the given regex and writes each part into different columns.

Example for HBase Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = null agentone.sinks.k1.channel = channelone

HBase supports two types of sink models, HBase2Sink and AsyncHBase Sink.

Let us see each HBase sink model in detail.



9.1 HBase2 Sink

HBase2 Sink is similar to HBase Sink for HBase version 2. The only difference is the hbase2 tag in the sink type and the package/class names. The type is the FQCN: org.apache.flume.sink.hbase2.HBase2Sink.

Example for HBase2 Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = hbase2 agentone.sinks.k1.table = foo_table agentone.sinks.k1.columnFamily = bar_cf agentone.sinks.k1.serializer = org.apache.flume.sink.hbase2.RegexHBase2EventSerializer agentone.sinks.k1.channel = channelone


9.2 AsyncHBase Sink

AsyncHBase Sink uses the Asynchbase API to write to HBase. AsyncHBase sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity.

Example for AsyncHBase Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = asynchbase agentone.sinks.k1.table = foo_table agentone.sinks.k1.columnFamily = bar_cf agentone.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer agentone.sinks.k1.channel = channelone


10. MorphlineSolr Sink

Apache Flume MorphlineSolr Sink is stream data into HDFS (via the HdfsSink) and concurrently extract, transform and load the same data into Solr (via MorphlineSolr Sink).

We can configure ETL functionality by using a morphine configuration file that defines a chain of transformation commands that pipe event records from one command to another. The JAR of Solr and morphine should be placed under the lib directory of Apache Flume.

Example for the Apache Flume MorphlineSolr Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agentone.sinks.k1.channel = channelone agentone.sinks.k1.morphlineFile = /etc/flume-ng/conf/morphline.conf # agentone.sinks.k1.morphlineId = morphline1 # agentone.sinks.k1.batchSize = 1000 # agentone.sinks.k1.batchDurationMillis = 1000


11. ElasticSearch Sink

Apache Flume ElasticSearch Sink writes data to an elastic search cluster at midnight UTC.

Example for the Apache Flume ElasticSearch Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = elasticsearch agentone.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300 agentone.sinks.k1.indexName = foo_index agentone.sinks.k1.indexType = bar_type agentone.sinks.k1.clusterName = foobar_cluster agentone.sinks.k1.batchSize = 500 agentone.sinks.k1.ttl = 5d agentone.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer agentone.sinks.k1.channel = channelone


12. Kite Dataset Sink

Apache FlumeKite Dataset is an experimental sink that writes events to a Kite Dataset.


13. HTTP Sink

Apache Flume HTTP Sink takes an event from the channel and sends those events to a remote service using an HTTP POST request. The content of the event is sent as the POST body. In case there is an error then a backoff signal is raised and events are not consumed from the channel.

Example for Apache Flume HTTP Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = http agentone.sinks.k1.channel = channelone agentone.sinks.k1.endpoint = http://localhost:8080/someuri agentone.sinks.k1.connectTimeout = 2000 agentone.sinks.k1.requestTimeout = 2000 agentone.sinks.k1.acceptHeader = application/json agentone.sinks.k1.contentTypeHeader = application/json agentone.sinks.k1.defaultBackoff = true agentone.sinks.k1.defaultRollback = true agentone.sinks.k1.defaultIncrementMetrics = false agentone.sinks.k1.backoff.4XX = false agentone.sinks.k1.rollback.4XX = false agentone.sinks.k1.incrementMetrics.4XX = true agentone.sinks.k1.backoff.200 = false agentone.sinks.k1.rollback.200 = false agentone.sinks.k1.incrementMetrics.200 = true


14. Custom Sink

Apache Flume Custom Sink is the user’s own implementation of the Sink interface. After starting Flume Agent, a custom sink’s class and its dependencies must be included in the agent’s classpath.

Example for Custom Sink.

agentone.channels = channelone agentone.sinks = k1 agentone.sinks.k1.type = org.example.MySink agentone.sinks.k1.channel = channelone