The objective of this tutorial is to describe step by step process to install Hadoop 3 on Ubuntu 18.04.4 LTS (Bionic Beaver), once the installation is completed you can run commands for HDFS and map-reduce.

Platform

  • Operating System (OS). You can use Ubuntu 18.04.4 LTS version or later version, also you can use other flavors of Linux systems like Redhat, CentOS, etc.
  • Hadoop. We have used Apache Hadoop 3.1.2 version you can use Cloudera distribution or other distribution as well.

Download Software

  • VMWare Player for Windows
  • https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/7_0

  • Ubuntu
  • http://releases.ubuntu.com/18.04.4/ubuntu-18.04.4-desktop-amd64

  • Eclipse for windows
  • https://www.eclipse.org/downloads/

  • Putty for windows
  • http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

  • Winscp for windows
  • http://winscp.net/eng/download.php

  • Hadoop
  • https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz


Steps to Install Ubuntu 18.04.4 LTS on VMware 7 64 Bit Platform

Please follow the below steps to install Ubuntu 18.04.4 LTS on VMware 7 64 Bit platform.

    Step 1. Start the downloaded vmware_player/7_0 and click on the file and select New Virtual Machine.




    Step 2. Select the downloaded Ubuntu 18.04.4 LTS image.




    Step 3. Enter the username/password and click on next.




    Step 4. Enter the name of Virtual Machine and click on next.




    Step 5. Leave the default configuration and click on next to proceed with the installation. Now the installation of Ubuntu 18.04.4 LTS will take around 30 minutes post which system would be ready for Hadoop installation.



Please follow the below steps once the above software/ Ubuntu 18.04.4 LTS configuration on VM is completed.


Steps to Install Hadoop 3 on Ubuntu 18.04.4


    Step 1. Please download Hadoop 3.1.2 from the below link.

    On Windows: https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz

    On Linux: $wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz


    Step 2. Install Java 8 using the below command.

    cloudduggu@ubuntu:-$sudo apt-get install openjdk-8-jdk





    Press Y to continue the installation.



    Once the java installation is completed please verify it by running the below command.

    cloudduggu@ubuntu:-$java –version



    SSH should be installed and in running state to use Hadoop Script also PDSH also should be installed to provide better SSH resource management.


    Step 3. Install SSH on your system using the below step.

    cloudduggu@ubuntu:-$sudo apt-get install ssh



    Please enter the password for the sudo user and press enter.



    Press Y to continue the installation.



    Once the installation is completed, the above notification will come.


    Step 4. Now install PDSH using the below command.

    cloudduggu@ubuntu:-$sudo apt-get install pdsh





    Press Y to continue installation.




    Step 5. Now open the .bashrc file using any editor, we will use nano to edit the .bashrc file and enter export PDSH_RCMD_TYPE=ssh.






    Press CTRL + O to save the file. Once the file is saved press CTRL+X to exit from the editor.


    Step 6. Now we will configure SSH by running the below command.

    cloudduggu@ubuntu:-$ssh-keygen -t rsa -P ""



    Press Enter when it asks for a filename.






    Step 7. Now copy the public key to the authorized key using the below command.

    cloudduggu@ubuntu:-$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys




    Step 8. Please verify SSH setup by running the below command.

    cloudduggu@ubuntu:-$ssh localhost




    Step 9. Now update the source file using the below command.

    cloudduggu@ubuntu:-$sudo apt-get update






    Step 10. Now we are ready to install Hadoop. In our case, it is present at the below location.

    Please check your download folder to locate the Hadoop tar file.

    /home/cloudduggu/hadoop-3.1.2.tar.gz




    Step 11. Let us extract it by using the below command and rename the folder to Hadoop to make it meaningful.

    cloudduggu@ubuntu:-$tar xzf hadoop-3.1.2.tar.gz

    cloudduggu@ubuntu:-$mv hadoop-3.1.2 hadoop






    Step 12. Now we will set up a java home in the Hadoop-env. sh file.


    Hadoop-env.sh file location:/home/cloudduggu/hadoop/etc/hadoop/

    JAVA file location: /usr/lib/jvm/java-8-openjdk-i386/





    Enter java home location in Hadoop-env. sh file and save it (use CTRL+O).


    Step 13. Now open the .bashrc file from the user’s home and add the below parameters to update the location of Hadoop.


    cloudduggu@ubuntu:-$nano .bashrc

  • export HADOOP_HOME="/home/cloudduggu/hadoop"
  • export PATH=$PATH:$HADOOP_HOME/bin
  • export PATH=$PATH:$HADOOP_HOME/sbin
  • export MAPRED_HOME=${HADOOP_HOME}
  • export HDFS_HOME=${HADOOP_HOME}
  • export YARN_HOME=${HADOOP_HOME}



  • Now save the changes by pressing CTRL + O and exit from nano editor by pressing CTRL + X.


    Step 14. Now open the core-site.xml file which is located under /hadoop/etc/hadoop and follow the below commands.

    cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano core-site.xml

    <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/cloudduggu/hadoopdata</value> </property> </configuration>




    Now save the changes by pressing CTRL + O and exit from nano editor by pressing CTRL + X.


    Step 15. Open mapred-site.xml file which is located under /hadoop/etc/hadoop and follow the below command.

    cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano mapred-site.xml

    <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>




    Now save the changes by pressing CTRL + O and exit from nano editor by pressing CTRL + X.


    Step 16. Open hdfs-site.xml file which is located under /hadoop/etc/hadoop and follow the below command.

    cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano hdfs-site.xml

    <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>




    Now save the changes by pressing CTRL + O and exit from nano editor by pressing CTRL + X.


    Step 17. Open yarn-site.xml file which is located under /hadoop/etc/hadoop and follow the below command.

    cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano yarn-site.xml

    <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>




    Now save the changes by pressing CTRL + O and exit from nano editor by pressing CTRL + X.


    Step 18. Since this is the first time installation we need to format HDFS using the below command.

    cloudduggu@ubuntu:~/hadoop$ bin/hdfs namenode –format





    Once HDFS is formatted you will see a highlighted message.


    Step 19. Now start HDFS and YARN services using the below command and verify it using the command line and web browser.

    For HDFS cloudduggu@ubuntu:~/hadoop$ sbin/start-dfs.sh





    Once HDFS services are started you can verify it by running the JPS command on the command prompt and by putting localhost:9870 in the web browser.



    For YARN cloudduggu@ubuntu:~/hadoop$ sbin/start-yarn.sh



    Verify it from a web browser by putting localhost:8088.



    So now we have completed Hadoop 3 installation on Ubuntu 18.04.4 Version.