Apache Kafka Interview Questions

Top 30 Apache Kafka Question and Answers

1. What is Apache Kafka?

Apache Kafka is an open-source, publish-subscribe, and event streaming platform that is written in Java and Scala. Kafka is used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications and is used by many companies.

2. What are the components of Apache Kafka?

There are four components of Apache Kafka.

  • Topic: It is a stream of messages which belong to the same types.
  • Producer: It is used to send messages to the topic.
  • Brokers: Brokers are the set of servers in which messages are stored.
  • Consumer: It is used to pull data from brokers.

3. What is Apache Kafka Consumer Group?

Apache Kafka consumer groups are the process that is used to consume and process the records. These processes provide scalability by running this task over a single machine or on a distributed machine. The consumer group which is having the same group ID will be in the same consumer group.

4. What is the importance of ZooKeeper in Kafka?

ZooKeeper is used to store metadata about the Kafka cluster as well as consumer client details. Also, it helps to leader detection, configuration management, synchronization, and detecting any node leaving or joining the cluster.

5. Can we use Kafka without a zookeeper?

Zookeeper plays an important role in Kafka cluster setup. If it is down then the client request can’t be served also it is not possible to bypass the zookeeper and make a connection with the Kafka cluster.

6. What are the use cases of Apache Kafka?

Apache Kafka is used in stream processing, metrics analysis, Website Activity Tracking, servers log Aggregation, Event Sourcing, and so on. Kafka is used by top Fortune 100 companies.

7. What do you understand by event streaming?

An event streaming is a process of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events, after that storing these event streams strongly for later recovery, manipulating, processing, and reacting to the event streams in real-time as well as retrospectively and routing the event streams to different destination technologies as needed.

8. What are the use cases of event streaming?

Event streaming is used in a variety of use cases across organizations.

Let us see a few of them.

  • It is used to process payments and financial transactions in real-time, such as in-stock exchanges, banks, and insurances.
  • It is used to track and monitor cars, trucks, fleets, and shipments in real-time, such as in logistics and the automotive industry.
  • It is used to continuously capture and analyze sensor data from IoT devices or other equipment, such as in factories and wind parks.
  • It is used to collect and immediately react to customer interactions and orders, such as in retail, the hotel and travel industry, and mobile applications.
  • It is used to monitor patients in hospital care and predict changes in condition to ensure a timely treatment in emergencies.
  • It is used to connect, store, and make available data produced by different divisions of a company.
  • It is used to serve as the foundation for data platforms, event-driven architectures, and microservices.

9. What are the APIs provided by Kafka?

Kafka provides five core APIs for Java and Scala as mentioned below.

  • Kafka Admin API: It is used to manage and inspect topics, brokers, and other Kafka objects.
  • Kafka Producer API: It is used to publish (write) a stream of events to one or more Kafka topics.
  • Kafka Consumer API: It is used to subscribe to (read) one or more topics and to process the stream of events produced to them.
  • Kafka Streams API: It is used to implement stream processing applications and microservices.
  • Kafka Connect API: It is used to build and run reusable data import/export connectors that consume (read) or produce (write) streams of events from and to external systems and applications so they can integrate with Kafka.

10. What is replication in Apache Kafka?

Apache Kafka is used to replicating the log for each topic's partitions across a configurable number of servers. So in case, a failure occurs them the replica will be available on other nodes will be used.

11. What is the process of starting of Kafka server?

To start a Kafka server, we have to start a zookeeper server as well.

Let us see how to start Kafka and zookeeper servers.

The command to Start zookeeper Server:
cloudduggu@ubuntu:~/kafka$ ./bin/zookeeper-server-start.sh config/zookeeper.properties

The command to Start Kafka Server:
cloudduggu@ubuntu:~/kafka$ ./bin/kafka-server-start.sh config/server.properties

12. How to delete Kafka topic?

We can use the below command to delete the Kafka topic.

cloudduggu@ubuntu:~/kafka$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic topic_name

13. What is Apache Kafka producer?

Apache Kafka producer works as the source of the data stream that is used to send data directly to the broker that is the leader for the partition without any intervening routing tier.

14. What is Kafka consumer?

Apache Kafka consumers are used to consume the messaged from the partitions. Multiple consumers can take data from multiple partitions in the Kafka cluster. The "fetch" command is used to start the consumers.

15. What is log compaction in Kafka?

Apache Kafka log compaction ensures that for a single topic there should be the last value present. It is used to restore the condition such as a system failure has occurred or an application is restoring after the maintenance window.

16. What are system tools available in Apache Kafka?

There are three categories of system tools.

  • Kafka Mirror Maker: It helps in mirroring one Kafka cluster to another.
  • Kafka Migration Tool: It ensures the migration of a broker from a specific version to another.
  • Kafka Consumer Offset Checker: It shows the Topic, Owner, and Partitions for a particular set of Topics and Consumer Group.

17. Apache Kafka is better than RabbitMQ, how?

Apache Kafka provides rich features such as distributed, highly available, and a durable system for data sharing and replication which is better than RabbitMQ as RabbitMQ does not have these features. Kafka can extend its performance to 100,000 per second whereas RabbitMQ can extend to 20,000 messages per second.

18. What are some major operations performed in Apache Kafka?

The following are some important operations performed in Apache Kafka.

  • Kafka topics can be modified.
  • Consumer position can be located.
  • Data can be migrated automatically.
  • Kafka topics can be added and deleted.
  • Distinguished Turnoff.
  • Kafka cluster can be increased.
  • Data mirroring can be done between Kafka clusters.
  • Servers can be decommissioned.

19. What is the command to create Kafka Topic?

The following is the command to create Kafka Topic.

cloudduggu@ubuntu:~/kafka$ ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1  --partitions 2 --topic topic_name 

20. What is command to list already created Kafka Topics?

The following is the command to list Kafka Topic.

cloudduggu@ubuntu:~/kafka$ ./bin/kafka-topics.sh --list --zookeeper localhost:2181 

21. What is the tool name in Kafka to Produce messages?

“kafka-console-producer.sh” tool is used to start Kafka producer. Once Kafka producer is started then we can send messages.

22. What is the tool name in Kafka to Consumer messages?

“kafka-console-consumer.sh” tool is used to start Kafka consumer. Once Kafka consumer is started then messages can be received.

23. How to start Kafka Producer?

The following tool is used to start Kafka Producer.

cloudduggu@ubuntu:~/kafka$ ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic_name 

24. How to start Kafka Consumer?

The following tool is used to start Kafka Consumer.

cloudduggu@ubuntu:~/kafka$ ./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic_name --from-beginning 

25. What is the role of offset in Kafka?

An offset is used to identify each message in the partitions as there is a sequential ID number given to the messages.

26. What is the fault-tolerance in Apache Kafka?

The fault-tolerance feature is an important fracture of Apache Kafka. Due to this feature Apache Kafka keeps data safe in the event of a total system failure, major update, or component malfunction. Kafka's fault-tolerance feature is achieved by replicating every message within the system to store in case of malfunction.

27. What are the advantages of Apache Kafka?

The following are some key advantages of Apache Kafka.

  • Kafka is very fast.
  • Kafka is scalable.
  • It comprises brokers. Every single broker is capable of handling megabytes of data.
  • Using Kafka large dataset can be easily analyzed.
  • Kafka is durable.
  • The Design of Apache Kafka is distributed and robust.

28. What is the functionality of Producer API in Kafka?

The producer API is responsible to allow the application to push a stream of records to one of the Kafka topics.

29. What is the functionality of Consumer API in Kafka?

The Consumer API is responsible allows the application to receive one or more topics and at the same time process the stream of data that is produced.

30. What is the main difference between Kafka and Flume?

The following are the difference between Apache Kafka and Apache Flume.

Sr. No. Apache Kafka Apache Flume
1 Apache Kafka is a real-time, distributed data stream processing framework that is used for data ingestion and processing. Apache Flume is used to move logs from different sources to the centralized data store such as Hadoop HDFS.
2 Apache Kafka is easy to scale. Apache Flume is not scalable as Kafka.
3 Apache Kafka works on the pull model. Apache Flume works on the push model.
4 Apache Kafka supports automatic recovery in case of node failures. In the case of Apache Flume agent failure, we can lose the events in the channel.
5 Apache Kafka is a general-purpose publish-subscribe model messaging system. Apache Flume is specially designed for Hadoop.