Apache Storm Interview Questions

Top 30 Apache Storm Question and Answers

1. What is Apache Strom System?

Apache Strom is an open-source, distributed, fault-tolerant, and real-time computing system that is used to process the stream of data in real-time. It was invented at BackType/Twitter and in 2011, it was open-sourced. Apache Strom has a very strong community (around 12) and 70 plus contributions.

2. What the benefits of Apache Storm?

The following are some core benefits of Apache Storm.

  • Apache Storm is very easy to operate.
  • It is capable of processing 100 messages per second per node.
  • Apache Storm can easily fault and restart the process automatically.
  • It provides guarantees to execute each data unit at least once.
  • Apache Strom can scale horizontally.

3. What is the difference between Hadoop and Apache Storm?

The following are some of the differences between Hadoop and Storm.

Hadoop Apache Storm
Hadoop is a distributed, batch processing system that uses the MapReduce framework. The storm is a distributed, real-time data processing system that uses DAGs.
The latency in Hadoop is very high. The latency in Strom is low compared with Hadoop.
The framework of Hadoop is written in Java. The framework of Strom is written in Clojure and Java.
Hadoop provides State-full stream processing. Storm provides stateless stream processing.
It is very easy to setup Hadoop but difficult to handle the Hadoop cluster. The storm is very easy to handle compared to Hadoop.
Hadoop is used by companies like Navisite, Twitter, and so on. The storm is used by companies such as Search Engine Data and so on.

4. What are the major components of Apache Storm?

Apache Strom has the following major components to perform the streaming of data flow.

  • Bolt: It is a processing unit in the Storm Cluster.
  • Spout: It is the source of data in the Strom Cluster.
  • Tuple: It is a named list of values also represented as the main data structure in Storm Cluster.

5. What is the difference between Apache Strom and Apache Kafka?

The following are some of the differences between Apache Strom and Apache Kafka.

Apache Storm Apache Kafka
Apache Strom is used to processing messages in a real-time manner. Apache Kafka is a distributed messaging system.
Apache Storm is invested by Twitter. Apache Kafka is invested by LinkedIn.
Apache Strom supports all languages. Apache Kafka also supports all languages but majorly Java is suggested.
The latency in Apache Storm is very low. The latency in Apache Kafka is dependent upon the data source (usually 1-2 seconds).
Apache Storm is basically used for Stream Processing. Apache Kafka is basically used for the Message Broker service.
It can process micro-batches. It can process small batches.
Apache Strom has no such dependency. Apache Kafka has a dependency on Zookeeper.

6. What are Streams in Apache Strom?

The stream is Strom’s core abstraction which is a limitless sequence of Tuples. It provides "spouts" and "bolts" to perform basic Stream transformation into a new Stream.

7. What are the different types of Apache Strom stream grouping?

The following are the different types of Apache Strom stream grouping.

  • Local grouping
  • Global grouping
  • Shuffle grouping
  • Fields grouping
  • None grouping
  • Direct grouping
  • All grouping

8. What are the benefits of using Apache Strom in financial services?

The following are the benefits provided by Apache Strom in financial service.

  • It helps to detect securities fraud.
  • It maintains order routing.
  • It helps to maintain compliance.

9. What is the use of “Topology_Message_Timeout_secs” in Apache Storm?

"Topology_Message_Timeout_secs” represents the maximum time that is allocated to process a message. In case the message is not admitted in stipulated time then the message will be failed on Spout.

10. What is Nimbus Node in Apache Storm?

Apache Strom Nimbus node is the master node in the Apache Strom cluster which is responsible to keep track of workers. It allocates the resource to workers per requirement. Apache Strom Nimbus node acts as the Namenode of Hadoop.

11. What are Worker or Supervisor Nodes in Apache Storm?

Nimbus node assign tasks to Worker nodes and worker nodes are used to perform the actual operations. All worker nodes run the Supervisor daemon which is responsible for starting and stopping the worker process.

12. What is the use of Zookeeper in Apache Storm Cluster?

A zookeeper is used to maintain the coordination between the Storm Cluster. It maintains the Storm Cluster state. Zookeeper has very less load because it doesn’t use for message passing.

13. What is Topology in Apache Storm?

Topology is used to perform real-time computation in Apache Strom Cluster. It is a graphical representation of computation and implemented as DAG (directed acyclic graph). Every node in a topology comprises processing logic that is also called bolts and the relation between nodes shows the data processing in between nodes.

14. What are Bolts in Apache Strom?

Spouts are used to read tuples data from the source and release it to the topology. It is an entry point in Storm topology also called the source of data.

Spouts are divided into two types.

  • Reliable: This type of Spout can replay in case of failure and call as “at least once message processing”.
  • Unreliable: This type of Spout does not replay the tuples in case of failure and is called “at most once message processing”.

15. What is Bolts in Apache Strom?

Bolts are used to perform all processing in a topology by using functions such as joins, filters, aggregations, and so on. Bout can perform easy transformation as well as complex transformation such as the transformation of tweets would require at least two Bolts.

16. What are the three main components that are used to run topology?

The following are the major components that are used to run topology.

  • Worker Process: It is used to execute a sub-portion of a topology. It belongs to topology and may trigger one or more executors for one or more components of topology.
  • Executor: It can trigger one or more tasks from the same component. It is a thread that is spawned by a worker process.
  • Task: It is used to perform actual data processing.

17. What is the use of a combiner aggregator?

A combiner aggregator is used to performing the grouping of tuples in a unified field.

18. What is the command to kill an Apache Storm topology?

The following is the command to kill Apache Storm topology.

Storm kill {cloudduggu_topology} here "cloudduggu_topology" is the name of topology.

19. What is the use of ZeroMQ in Apache Storm?

ZeroMQ is used to maintain communication between tasks in the Apache Strom cluster.

20. Can we use Apache Storm as a Proxy server?

We can use the mod-proxy module to do that.

21. What is the command to check tpd.Conf Consistency?

The following command is used to check tpd.Conf Consistency.

Httpd –S

22. What are the companies which are using Apache Strom?

The following is a list of few companies which are using Apache Strom.

  • Yahoo!
  • Twitter
  • Spotify
  • Yelp
  • Flipboard

23. What are the different modes of Apache Storm?

Apache Strom provides the following operational modes.

  • Local Mode: This mode is used for testing and debugging of topology.
  • Production Mode: This mode is used for Production Operation.

24. Can we update a running topology in Apache Strom?

We can update a running topology by first kill it and then resubmit a new one.

25. How to monitor a running Apache Storm topology?

We can use the Apache Strom User interface (UI) tool to monitor the running topology. It provides information such as task error, latency, performance, and so on.

26. What are the built-in schedulers provided by Apache Storm?

Apache Storm provides the following list of built-in schedulers.

  • Multitenant Scheduler
  • Isolation Scheduler
  • ResourceAware Scheduler
  • Default Scheduler

27. What will happen if a worker will not work and dead?

If a worker node is dead then the Nimbus node will restart it. If it will keep getting fail during startup then the Nimbus node will reschedule the worker.

28. What will happen if a node in the Apache Storm cluster is dead?

In the case of a Node failure, the task which is getting executed by that node will be timed out and the Nimbus node will reassign that task to another node.

29. Is there a search engine present in Apache Strom?

Apache Storm has a search engine present that can be searched by “Search title” in Apache to provide relevant information.

30. Please specify when and why the cleanup method is started in Apache Storm?

A cleanup method is called when a Bolt is shut down and clean up is required to clear all open resources.