Apache Flume Sink is the component of the Flume agent that is used to consume the data from the channel and send them to the destination system. Flume Sink is associated with a unique name that is used to bifurcate the configuration and working namespaces.
Apache Flume provides different types of the sink as mentioned below.
- HDFS Sink
- Hive Sink
- Logger Sink
- Avro Sink
- Thrift Sink
- IRC Sink
- File Roll Sink
- Null Sink
- HBase Sink
- MorphlineSolr Sink
- ElasticSearch Sink
- Kite Dataset Sink
- HTTP Sink
- Custom Sink
Let us see each Flume Sink in detail.
1. HDFS Sink
Apache Flume HDFS sink is used to move events from the channel to the Hadoop distributed file system. It also supports text and sequence-based files.
If we are using Apache Flume HDFS Sink in that case Apache Hadoop should be installed so that Flume can communicate with the Hadoop cluster using Hadoop JARs.
We need to note here that the version of Hadoop that supports the sync() call is a must.
Example for the Apache Flume HDFS Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = hdfs
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
agentone.sinks.k1.hdfs.filePrefix = events-
agentone.sinks.k1.hdfs.round = true
agentone.sinks.k1.hdfs.roundValue = 10
agentone.sinks.k1.hdfs.roundUnit = minute
2. Hive Sink
Apache Flume Hive Sink streams delimited text or JSON events directly into a Hive table or partition. Once events are committed to the Hive table post that it becomes visible to Hive queries.
Apache Hive partitions can be pre-created or Flume will create them before streaming data and the fields for stream data will be mapped with matching columns in the Hive table.
Example for Apache Hive table.
create table web_data ( id int , msg string )
partitioned by (continent_data string, country_code string, time string)
clustered by (id) into 5 buckets
stored as orc;
Example for Apache Flume Hive Sink.
agentone.channels = channelone
agentone.channels.channelone.type = memory
agentone.sinks = k1
agentone.sinks.k1.type = hive
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
agentone.sinks.k1.hive.database = logsdb
agentone.sinks.k1.hive.table = weblogs
agentone.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
agentone.sinks.k1.useLocalTimeStamp = false
agentone.sinks.k1.round = true
agentone.sinks.k1.roundValue = 10
agentone.sinks.k1.roundUnit = minute
agentone.sinks.k1.serializer = DELIMITED
agentone.sinks.k1.serializer.delimiter = "\t"
agentone.sinks.k1.serializer.serdeSeparator = '\t'
agentone.sinks.k1.serializer.fieldnames =id,,msg
3. Logger Sink
Apache Flume Logger Sink is basically used for testing/debugging purpose and will not require any extra configuration.
Example for Apache Flume Logger Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = logger
agentone.sinks.k1.channel = channelone
4. Avro Sink
Apache Flume Avro sink gets the stream in the form of batches and sent them to the configured hostname/port pair.
Example for Apache Flume Avro Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = avro
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.hostname = 10.10.10.10
agentone.sinks.k1.port = 4545
5. Thrift Sink
Apache Flume Thrift Sink gets the flume events in batches from the channel and converted them into Thrift events and sent to the configured hostname/port pair.
Example for Apache Flume Thrift Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = thrift
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.hostname = 10.10.10.10
agentone.sinks.k1.port = 4545
6. IRC Sink
Apache Flume TIRC Sink gets the messages from the channel and gives them to configured IRC destination systems.
Example for Apache Flume IRC Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = irc
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.hostname = irc.yourdomain.com
agentone.sinks.k1.nick = flume
agentone.sinks.k1.chan = #flume
7. File Roll Sink
Apache Flume File Roll Sink gets the steams from the channel and stores events on the local file system.
Example for Apache Flume File Roll Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = file_roll
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.sink.directory = /var/log/flume
8. Null Sink
Apache Flume Null Sink cancels all events that are received from the channel.
Example for Apache Flume Null Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = null
agentone.sinks.k1.channel = channelone
9. HBase Sink
Apache Flume HBase sink check the configuration file of the hbase-site.xml and then write data into the HBase database.
The following two types of serializers are provided with Apache Flume for convenience.
- SimpleHbaseEventSerializer: This serializer is primarily an example implementation that is used to write the event body as-is to HBase, and optionally increments a column in Hbase.
- RegexHbaseEventSerializer: This serializer breaks the event body based on the given regex and writes each part into different columns.
Example for HBase Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = null
agentone.sinks.k1.channel = channelone
HBase supports two types of sink models, HBase2Sink and AsyncHBase Sink.
Let us see each HBase sink model in detail.
9.1 HBase2 Sink
HBase2 Sink is similar to HBase Sink for HBase version 2. The only difference is the hbase2 tag in the sink type and the package/class names. The type is the FQCN: org.apache.flume.sink.hbase2.HBase2Sink.
Example for HBase2 Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = hbase2
agentone.sinks.k1.table = foo_table
agentone.sinks.k1.columnFamily = bar_cf
agentone.sinks.k1.serializer = org.apache.flume.sink.hbase2.RegexHBase2EventSerializer
agentone.sinks.k1.channel = channelone
9.2 AsyncHBase Sink
AsyncHBase Sink uses the Asynchbase API to write to HBase. AsyncHBase sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity.
Example for AsyncHBase Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = asynchbase
agentone.sinks.k1.table = foo_table
agentone.sinks.k1.columnFamily = bar_cf
agentone.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
agentone.sinks.k1.channel = channelone
10. MorphlineSolr Sink
Apache Flume MorphlineSolr Sink is stream data into HDFS (via the HdfsSink) and concurrently extract, transform and load the same data into Solr (via MorphlineSolr Sink).
We can configure ETL functionality by using a morphine configuration file that defines a chain of transformation commands that pipe event records from one command to another. The JAR of Solr and morphine should be placed under the lib directory of Apache Flume.
Example for the Apache Flume MorphlineSolr Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.morphlineFile = /etc/flume-ng/conf/morphline.conf
# agentone.sinks.k1.morphlineId = morphline1
# agentone.sinks.k1.batchSize = 1000
# agentone.sinks.k1.batchDurationMillis = 1000
11. ElasticSearch Sink
Apache Flume ElasticSearch Sink writes data to an elastic search cluster at midnight UTC.
Example for the Apache Flume ElasticSearch Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = elasticsearch
agentone.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
agentone.sinks.k1.indexName = foo_index
agentone.sinks.k1.indexType = bar_type
agentone.sinks.k1.clusterName = foobar_cluster
agentone.sinks.k1.batchSize = 500
agentone.sinks.k1.ttl = 5d
agentone.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
agentone.sinks.k1.channel = channelone
12. Kite Dataset Sink
Apache FlumeKite Dataset is an experimental sink that writes events to a Kite Dataset.
13. HTTP Sink
Apache Flume HTTP Sink takes an event from the channel and sends those events to a remote service using an HTTP POST request. The content of the event is sent as the POST body. In case there is an error then a backoff signal is raised and events are not consumed from the channel.
Example for Apache Flume HTTP Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = http
agentone.sinks.k1.channel = channelone
agentone.sinks.k1.endpoint = http://localhost:8080/someuri
agentone.sinks.k1.connectTimeout = 2000
agentone.sinks.k1.requestTimeout = 2000
agentone.sinks.k1.acceptHeader = application/json
agentone.sinks.k1.contentTypeHeader = application/json
agentone.sinks.k1.defaultBackoff = true
agentone.sinks.k1.defaultRollback = true
agentone.sinks.k1.defaultIncrementMetrics = false
agentone.sinks.k1.backoff.4XX = false
agentone.sinks.k1.rollback.4XX = false
agentone.sinks.k1.incrementMetrics.4XX = true
agentone.sinks.k1.backoff.200 = false
agentone.sinks.k1.rollback.200 = false
agentone.sinks.k1.incrementMetrics.200 = true
14. Custom Sink
Apache Flume Custom Sink is the user’s own implementation of the Sink interface. After starting Flume Agent, a custom sink’s class and its dependencies must be included in the agent’s classpath.
Example for Custom Sink.
agentone.channels = channelone
agentone.sinks = k1
agentone.sinks.k1.type = org.example.MySink
agentone.sinks.k1.channel = channelone