Navigation :

03 Hadoop-installation(ubuntu)

Hadoop installation on Ubuntu

In this tutorial, we will take you through step by step process to install Apache Hadoop on a Linux box (Ubuntu). This is 2 part process
- Part 1) Download and Install Hadoop
- Part 2) Configure Hadoop

There are 2 Prerequisites

You must have Ubuntu installed and running
You must have Java Installed.

Part 1) Download and Install Hadoop

Step 1) Add a Hadoop system user using below command

sudo addgroup hadoop_

sudo adduser –ingroup hadoop_ hduser_

Enter your password, name and other details.

NOTE: There is a possibility of below-mentioned error in this setup and installation process.

“hduser is not in the sudoers file. This incident will be reported.”

This error can be resolved by Login as a root user

Execute the command

sudo adduser hduser_ sudo

Re-login as hduser_

Step 2) Configure SSH

In order to manage nodes in a cluster, Hadoop requires SSH access

First, switch user, enter the following command

su - hduser_

This command will create a new key.

ssh-keygen -t rsa -P “”

Enable SSH access to local machine using this key.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Now test SSH setup by connecting to localhost as ‘hduser’ user.

ssh localhost

Note: Please note, if you see below error in response to ‘ssh localhost’, then there is a possibility that SSH is not available on this system-

To resolve this -

Purge SSH using,

sudo apt-get purge openssh-server

It is good practice to purge before the start of installation

Install SSH using the command-

sudo apt-get install openssh-server

Step 3) Next step is to Download Hadoop

Select Stable

Select the tar.gz file ( not the file with src)

Once a download is complete, navigate to the directory containing the tar file

Enter,

sudo tar xzf hadoop-2.2.0.tar.gz

Now, rename hadoop-2.2.0 as hadoop

sudo mv hadoop-2.2.0 hadoop

sudo chown -R hduser:hadoop hadoop

Part 2) Configure Hadoop

Step 1) Modify ~/.bashrc file

Add following lines to end of file ~/.bashrc

#Set HADOOP_HOME export HADOOP_HOME= #Set JAVA_HOME export JAVA_HOME= # Add bin/ directory of Hadoop to PATH export PATH=$PATH:$HADOOP_HOME/bin

Now, source this environment configuration using below command

. ~/.bashrc

Step 2) Configurations related to HDFS

Set JAVA_HOME inside file $HADOOP_HOME/etc/hadoop/hadoop-env.sh

With

There are two parameters in $HADOOP_HOME/etc/hadoop/core-site.xml which need to be set-

1. ‘hadoop.tmp.dir’ - Used to specify a directory which will be used by Hadoop to store its data files.

2. ‘fs.default.name’ - This specifies the default file system.

To set these parameters, open core-site.xml

sudo gedit $HADOOP_HOME/etc/hadoop/core-site.xml

Copy below line in between tags

hadoop.tmp.dir /app/hadoop/tmp Parent directory for other temporary directories. fs.defaultFS hdfs://localhost:54310 The name of the default file system.