Apache Kafka Streams Application

Apache Kafka Streams WordCountDemo Program

Apache Kafka Streams is termed as a client library that is used to create real-time and mission-critical applications. In Kafka Stream operation the input and output data is stored in Kafka cluster.

In this section, we will see a “wordcount” program using Kafka Stream, which will take “text” as input from the command line and process it, after processing it will produce an output of word count.

Let us see the step-by-step implementation of the “wordcount” program using Kafka Streams.


Step 1. Start Kafka and Zookeeper Server

Please start zookeeper and Kafka servers.

Command For Zookeeper:
cloudduggu@ubuntu:~/kafka$ ./bin/zookeeper-server-start.sh config/zookeeper.properties

Command For Kafka:
cloudduggu@ubuntu:~/kafka$ ./bin/kafka-server-start.sh config/server.properties

start zookeeper server

start kafka broker


Step 2. Create Input and Output Topic

Now we will create an input topic named streams-plaintext-input and the output topic named streams-wordcount-output.

Command For Input Topic:
cloudduggu@ubuntu:~/kafka$./bin/kafka-topics.sh --create \ --bootstrap-server localhost:9092 \ --replication-factor 1 \ --partitions 1 \ --topic streams-plaintext-input

Command For Output Topic:
cloudduggu@ubuntu:~/kafka$./bin/kafka-topics.sh --create \ --bootstrap-server localhost:9092 \ --replication-factor 1 \ --partitions 1 \ --topic streams-wordcount-output \ --config cleanup.policy=compact

create topic output


Step 3. Start the Wordcount Application

Now start the wordcount application using the below command in a separate terminal.

The WordCount application program will read data from the input topic streams-plaintext-input and perform the computations of the WordCount algorithm on each of the read messages, and continuously write its current results to the output topic streams-wordcount-output.

Command:
cloudduggu@ubuntu:~/kafka$./bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo

kafka word count app


Step 4. Start Producer

Open a new terminal and start Kafka producer using tool “kafka-console-producer.sh” and write some input.

Command:
cloudduggu@ubuntu:~/kafka$ ./bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input

start producer


Step 5. Start Producer

Open another terminal and run the consumer tool to read the input, process it, and print word count output.

Command:
cloudduggu@ubuntu:~/kafka$./bin/kafka-topics.sh --create \ --bootstrap-server localhost:9092 \ --replication-factor 1 \ --partitions 1 \ --topic streams-wordcount-output \ --config cleanup.policy=compact

kafka consumer start


We can see the output of the Wordcount application, which is actually a continuous stream of updates in which each output record is an updated count of a single word.