Apache Flink Applications

Apache Flink is a big data hybrid stream and batch processing framework that processes data in real-time using native iterative processing. It uses cyclic dataflow to process the set of bounded and unbounded streams. Due to its hybrid nature, it is called 4G, the 4th generation Big data processing framework that is capable to run all applications which include MapReduce, Google data flow, storms as so on. Apache Flink comes with the Cost Based Optimizer to create an optimized plan and process the stream and batch applications.

The Apache Flink considers the streams, state, and time as a core building block while processing an application because these states define how well an application would be processed in Flink architecture.

Let us see each Apache Flink state in detail.

1. Streams

As we know now, streams are defined as a continuous flow of data and it is fundamental for a stream processing system. Apache Flink is capable of processing any kind of stream data.

The following are the different types of streams of data that are processed by Apache Flink.

1.1 Bounded and Unbounded Streams

Apache Flink can process bounded and unbounded streams data set. The bounded streams dataset will have a start and endpoint and Flink process it as a batch operation whereas the unbounded streams data set will not have any start and endpoint and it will process in continuous nature.

1.2 Real-Time and Recorded Streams

This is another way of stream processing in which streams can be the process in real-time as it is generating or by storing the generated streams on storage such as file system and process later.

2. State

The state is another important building block of the Apache Flink applications. It represents the different states in which an application is executing so that the intermediate results can be accessed at any time.

The following are the different types of states.

2.1 Multiple State Primitives

Apache Flink provides the facility for the developer to choose the best state primitives for various kinds of data structures such as lists, maps, atomic values, and so on.

2.2 Pluggable State Backends

Apache Flink pluggable state backend is used to manage the application state also it performs the checkpointing.

2.3 State consistency

In case of application failures, Apache Flink provides a consistent state for the application using Flink checkpoint and recovery algorithms.

2.4 Very Large State

Apache Flink uses the asynchronous and incremental checkpoint algorithm for managing the application state that comes in TB.

2.5 Application Scalability

Apache Flink distributes the state of an application on multiple worker nodes to support Application Scalability.

3. Time

Time also plays an important role in stream generation and processing. In certain cases, the event is generated at a specific time such as pattern detection, windows aggregations,time-based joins, and so on. Apache Flink processes the applications by measuring the time.

The following are some of the features of Apache Flink time.

3.1 Event-time Mode

The applications that follow the Event-time mode to process streams use the timestamp of the stream to compute the result.

3.2 Late Data Handling

In some cases, the computation is completed post that the required streams are reached, which is called late data handling. Apache Flink handles such conditions by rerouting them via side outputs.

3.3 Processing-time Mode

Apache Flink can start the processing of the stream based on the processing machine start time which is suitable for the application that has a low-latency requirement.

Apache Flink Applications

Flink - Flink vs Hadoop vs Spark

Flink - Use Cases