Apache Spark Installation on Single Node

The objective of this tutorial is to describe step by step process to install Spark 2.4.5 (Version spark-2.4.5-bin-hadoop2.7) on Ubuntu 18.04.4 LTS (Bionic Beaver), once the installation is completed you can play with Spark.

Platform

  • Operating System (OS). You can use Ubuntu 18.04.4 LTS version or later version, also you can use other flavors of Linux systems like Redhat, CentOS, etc.
  • Spark. We have used Spark 2.4.5 (Version spark-2.4.5-bin-hadoop2.7).

Download Software

  • VMWare Player for Windows
  • https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/7_0

  • Ubuntu
  • http://releases.ubuntu.com/18.04.4/ubuntu-18.04.4-desktop-amd64

  • Spark
  • https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz

  • Eclipse for windows
  • https://www.eclipse.org/downloads/

For VMware and Ubuntu installation please refer to “Hadoop Installation on Single Node” in the Hadoop section.

Click Here To Download - spark-2.4.5-bin-hadoop2.7.tar (264.2 MB)

Steps to Install Spark 2.4.5 on Ubuntu 18.04.4 LTS


    Step 1. Please download Spark 2.4.5 from the below link.

    On Windows: https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz

    On Linux: $wget https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz


    Step 2. Install Java 8 using the below command.

    $sudo apt-get install openjdk-8-jdk

    java installation1

    java installation2

    Press Y to continue the installation.

    java installation3

    Once the java installation is completed please verify it by running the below command.

    $java –version

    java installation4


    Step 3. Now install Apache Spark. Download it from cmd using the below command.

    $wget https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz

    spark download1

    In our case, it is present at the below location.

    /home/cloudduggu/spark-2.4.5-bin-hadoop2.7.tgz

    spark download2


    Step 4. Now extract the tar file by using the below command and rename the folder to spark to make it meaningful.

    $tar xzf spark-2.4.5-bin-hadoop2.7.tgz

    spark untar1

    $mv spark-2.4.5-bin-hadoop2.7 spark

    spark move


    Step 5. Now edit .bashrc file using nano editor and export JAVA home and Spark home. In our case below is the location please verify yours.

    export SPARK_HOME=/home/cloudduggu/spark/

    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386/

    bashrc edit1

    bashrc edit2

    Now save the changes by pressing CTRL + O and exit from nano editor by pressing CTRL + X.


    Step 6. So now Spark installation is completed. Let us start the Spark shell by using the below command. Run it from spark home.

    $ /home/cloudduggu/spark/bin/spark-shell

    spark shell1

    spark_shell2