Apache Flink Machine Learning

Apache Flink provides the FlinkML API to support machine learning. The goal of FlinkML is to create a scalable and distributed system that can handle data of different sizes either it is MB, TB, or more than that. The major challenges which developers face are the glue codes that are resolved in FlinkML by minimizing the glue code.

The following are some of the algorithms supported by Apache FlinkML.

Supported Algorithms
Supervised Learning
Unsupervised Learning
Data Preprocessing
Recommendation
Outlier selection
Utilities

Let us see the example of the K-Means clustering algorithm in which a set of data points and a set of K clusters are provided for clustering. Apache Flink provides the JAR file named "KMeans.jar" under the "flink/examples/batch" directory that can be used to run the K-Means clustering.

The FlinkML program uses the default point and centroid data set.

To run the program use the following command.

cloudduggu@ubuntu:~/flink$ ./bin/flink run examples/batch/KMeans.jar --output Print

We can check the status of the FlinkML program from the Apache Flink GUI as well.

Apache Flink Machine Learning

Flink - CEP and Pattern API

Flink - Tools