Apache Flume is a big data streaming tool that is used to collect, aggregate, and move stream data from different types of sources system to the destination system. Its architecture is based on the streaming data flow. It provides the facility to overcome different types of failures.

Apache Flume Channels are the containers in which the events are stored on an agent. Source adds the events in channels and Sink removes events from channels.

Apache Flume provides different types of channels as mentioned below.

  1. Memory Channel
  2. JDBC Channel
  3. Kafka Channel
  4. File Channel
  5. Spillable Memory Channel
  6. Pseudo Transaction Channel
  7. Custom Channel

Let us see each Channel in detail.


1. Memory Channel

Apache Flume Memory Channel stores events in the memory queue. It can be configured with max size. Memory Channel is help full for the flows which require high throughput and prepared in case of agent failures.

Example for Memory Channel.

agentone.channels = channelone agentone.channels.channelone.type = memory agentone.channels.channelone.capacity = 10000 agentone.channels.channelone.transactionCapacity = 10000 agentone.channels.channelone.byteCapacityBufferPercentage = 20 agentone.channels.channelone.byteCapacity = 800000


2. JDBC Channel

In JDBC Channel, the events are stored in persistent storage and backed by a database. Currently, JDBC Channel supports embedded Derby database. JDBC Channel is used for flows where recoverability is important.

Example for Apache Flume JDBC Channel.

agentone.channels = channelone agentone.channels.channelone.type = jdbc


3. Kafka Channel

In Apache Flume Kafka Channel, the events are stored in a Cluster of Kafka. Kafka must be installed separately. Kafka provides better replication and availability. In case of agent or broker failure, the events are immediately available to other sinks.

The Kafka Channels can be used for multipurpose.

  • It provides a reliable and highly available channel for events when we use it with Flume source and sink.
  • By using Flume source and interceptor, it allows writing Flume events into a Kafka topic, for use by other apps.
  • By using Flume Channel with Kafka Channel, it provides a low-latency, fault-tolerant way to send events from Kafka to Flume sinks such as HDFS, HBase, or Solr.

Example for Apache Flume Kafka Channel.

agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092 agentone.channels.channel1.kafka.topic = channel1 agentone.channels.channel1.kafka.consumer.group.id = flume-consumer

Apache Flume Kafka Channel can be used with the following properties.

Let us see each property in detail.


3.1 Security and Kafka Channel

Apache Flume communication with Kafka supports secure authentication as well as it supports data encryption.

We can achieve that by setting kafka.producer|consumer.security.protocol to any of the following value indicates below.

  • SASL_PLAINTEXT: It provides no data encryption and supports plaintext authentication.
  • SASL_SSL: It provides data encryption and supports plaintext or Kerberos authentication.
  • SSL: It indicates TLS based encryption with optional authentication.

3.2 TLS and Kafka Channel

The below example is showing the configuration with server-side authentication and data encryption.

agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9093,kafka-2:9093,kafka-3:9093 agentone.channels.channel1.kafka.topic = channel1 agentone.channels.channel1.kafka.consumer.group.id = flume-consumer agentone.channels.channel1.kafka.producer.security.protocol = SSL agentone.channels.channel1.kafka.producer.ssl.truststore.location = /path/to/truststore.jks agentone.channels.channel1.kafka.producer.ssl.truststore.password = password to access the truststore agentone.channels.channel1.kafka.consumer.security.protocol = SSL agentone.channels.channel1.kafka.consumer.ssl.truststore.location = /path/to/truststore.jks agentone.channels.channel1.kafka.consumer.ssl.truststore.password = password to access the truststore


3.3 Kerberos and Kafka Channel

For using Kafka channel with a Kafka cluster secured with Kerberos, set the producer/consumer.security.protocol properties for producer and/or consumer.

JAVA_OPTS="$JAVA_OPTS -Djava.security.krb5.conf=/path/to/krb5.conf" JAVA_OPTS="$JAVA_OPTS -Djava.security.auth.login.config=/path/to/flume_jaas.conf"

Example for secure configuration using SASL_PLAINTEXT:

agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9093,kafka-2:9093,kafka-3:9093 agentone.channels.channel1.kafka.topic = channel1 agentone.channels.channel1.kafka.consumer.group.id = flume-consumer agentone.channels.channel1.kafka.producer.security.protocol = SASL_PLAINTEXT agentone.channels.channel1.kafka.producer.sasl.mechanism = GSSAPI agentone.channels.channel1.kafka.producer.sasl.kerberos.service.name = kafka agentone.channels.channel1.kafka.consumer.security.protocol = SASL_PLAINTEXT agentone.channels.channel1.kafka.consumer.sasl.mechanism = GSSAPI agentone.channels.channel1.kafka.consumer.sasl.kerberos.service.name = kafka

Example for secure configuration using SASL_SSL:

agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9093,kafka-2:9093,kafka-3:9093 agentone.channels.channel1.kafka.topic = channel1 agentone.channels.channel1.kafka.consumer.group.id = flume-consumer agentone.channels.channel1.kafka.producer.security.protocol = SASL_SSL agentone.channels.channel1.kafka.producer.sasl.mechanism = GSSAPI agentone.channels.channel1.kafka.producer.sasl.kerberos.service.name = kafka agentone.channels.channel1.kafka.producer.ssl.truststore.location = /path/to/truststore.jks agentone.channels.channel1.kafka.producer.ssl.truststore.password = password to access the truststore agentone.channels.channel1.kafka.consumer.security.protocol = SASL_SSL agentone.channels.channel1.kafka.consumer.sasl.mechanism = GSSAPI agentone.channels.channel1.kafka.consumer.sasl.kerberos.service.name = kafka agentone.channels.channel1.kafka.consumer.ssl.truststore.location = /path/to/truststore.jks agentone.channels.channel1.kafka.consumer.ssl.truststore.password = password to access the truststore


4. File Channel

Apache Flume file channel is called a tough channel because it stores the events in the disk so in case the JVM process is killed then or the system is rebooted and any crash happens, the event will be transferred to the next agent in the pipeline.

Example for Apache Flume File Channel.

agentone.channels = c1 agentone.channels.c1.type = file agentone.channels.c1.checkpointDir = /mnt/flume/checkpoint agentone.channels.c1.dataDirs = /mnt/flume/data


5. Spillable Memory Channel

Apache Flume Spillable Memory channels are used to store the event in memory as well as on the disk. The memory acts as primary storage and the disk is used when there is an overflow of data.

Apache Flume Spillable Memory channels are used for testing and not suggested for production system usage.

Example for Apache Flume Spillable Memory Channel.

agentone.channels = c1 agentone.channels.c1.type = SPILLABLEMEMORY agentone.channels.c1.memoryCapacity = 10000 agentone.channels.c1.overflowCapacity = 1000000 agentone.channels.c1.byteCapacity = 800000 agentone.channels.c1.checkpointDir = /mnt/flume/checkpoint agentone.channels.c1.dataDirs = /mnt/flume/data


6. Pseudo Transaction Channel

The Pseudo Transaction Channel is used for only unit testing and is NOT meant for production use.


7. Custom Channel

A custom channel is a user’s implementation of the Channel interface. While starting the Flume agent, the custom channel’s class and its dependencies must be included. The type of the custom channel is its FQCN.

Example for Custom Channel.

agentone.channels = c1 agentone.channels.c1.type = org.example.MyChannel