Apache Spark Tutorial

Preface

Apache Spark is a fast-growing and general-purpose cluster computing system. It provides a reach set of APIs in Java, Scala, Python, and R and an engine that supports general execution. It supports higher-level tools like Spark SQL for structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing of live data and SparkR.

This tutorial has been prepared to provide an introduction to Apache Spark, Spark Ecosystems, RDD features, Spark Installation on a single node and multi-node, Lazy evaluation, Spark high-level tools like Spark SQL, MLlib, GraphX, Spark Streaming , SparkR.


Prerequisites

A basic understanding of Core Java, Scala, Python, SQL, R Language, Linux operating system commands, and database concepts is required.


Spectators

This tutorial is created for any professionals who are keen to learn Apache Spark and wanted to grow in the field of Big Data. It will cover all prospective of Apache Spark.


So let's Begin it, Happy Learning.