03 Hadoop-installation(ubuntu)

Hadoop installation on Ubuntu

There are 2 Prerequisites

Part 1) Download and Install Hadoop

Step 1) Add a Hadoop system user using below command

sudo addgroup hadoop_

img

sudo adduser –ingroup hadoop_ hduser_

img

Enter your password, name and other details.

NOTE: There is a possibility of below-mentioned error in this setup and installation process.

“hduser is not in the sudoers file. This incident will be reported.”

img

This error can be resolved by Login as a root user

img

Execute the command

sudo adduser hduser_ sudo

img

Re-login as hduser_

img

Step 2) Configure SSH

In order to manage nodes in a cluster, Hadoop requires SSH access

First, switch user, enter the following command

su - hduser_

img

This command will create a new key.

ssh-keygen -t rsa -P “”

img

Enable SSH access to local machine using this key.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

img

Now test SSH setup by connecting to localhost as ‘hduser’ user.

ssh localhost

img

Note: Please note, if you see below error in response to ‘ssh localhost’, then there is a possibility that SSH is not available on this system-

img

To resolve this -

Purge SSH using,

sudo apt-get purge openssh-server

It is good practice to purge before the start of installation

img

Install SSH using the command-

sudo apt-get install openssh-server

img

Step 3) Next step is to Download Hadoop

img

Select Stable

img

Select the tar.gz file ( not the file with src)

img

Once a download is complete, navigate to the directory containing the tar file

img

Enter,

sudo tar xzf hadoop-2.2.0.tar.gz

img

Now, rename hadoop-2.2.0 as hadoop

sudo mv hadoop-2.2.0 hadoop

img

sudo chown -R hduser:hadoop hadoop

img

Part 2) Configure Hadoop

Step 1) Modify ~/.bashrc file

Add following lines to end of file ~/.bashrc

#Set HADOOP_HOME export HADOOP_HOME= #Set JAVA_HOME export JAVA_HOME= # Add bin/ directory of Hadoop to PATH export PATH=$PATH:$HADOOP_HOME/bin

img

Now, source this environment configuration using below command

. ~/.bashrc

img

Step 2) Configurations related to HDFS

Set JAVA_HOME inside file $HADOOP_HOME/etc/hadoop/hadoop-env.sh

img

img

With

img

There are two parameters in $HADOOP_HOME/etc/hadoop/core-site.xml which need to be set-

1. ‘hadoop.tmp.dir’ - Used to specify a directory which will be used by Hadoop to store its data files.

2. ‘fs.default.name’ - This specifies the default file system.

To set these parameters, open core-site.xml

sudo gedit $HADOOP_HOME/etc/hadoop/core-site.xml

img

Copy below line in between tags

hadoop.tmp.dir /app/hadoop/tmp Parent directory for other temporary directories. fs.defaultFS hdfs://localhost:54310 The name of the default file system.

img

Navigate to the directory $HADOOP_HOME/etc/Hadoop

img

Now, create the directory mentioned in core-site.xml

sudo mkdir -p

img

Grant permissions to the directory

sudo chown -R hduser:Hadoop

img

sudo chmod 750

img

Step 3) Map Reduce Configuration

Before you begin with these configurations, lets set HADOOP_HOME path

sudo gedit /etc/profile.d/hadoop.sh

And Enter

export HADOOP_HOME=/home/guru99/Downloads/Hadoop

img

Next enter

sudo chmod +x /etc/profile.d/hadoop.sh

img

Exit the Terminal and restart again

Type echo $HADOOP_HOME. To verify the path

img

Now copy files

sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml

img

Open the mapred-site.xml file

sudo gedit $HADOOP_HOME/etc/hadoop/mapred-site.xml

img

Add below lines of setting in between tags and

mapreduce.jobtracker.address localhost:54311 MapReduce job tracker runs at this host and port.

img

Open $HADOOP_HOME/etc/hadoop/hdfs-site.xml as below,

sudo gedit $HADOOP_HOME/etc/hadoop/hdfs-site.xml

img

Add below lines of setting between tags and

dfs.replication 1 Default block replication. dfs.datanode.data.dir /home/hduser_/hdfs

img

Create a directory specified in above setting-

sudo mkdir -p

sudo mkdir -p /home/hduser_/hdfs

img

sudo chown -R hduser:hadoop

sudo chown -R hduser:hadoop /home/hduser_/hdfs

img

sudo chmod 750

sudo chmod 750 /home/hduser_/hdfs

img

Step 4) Before we start Hadoop for the first time, format HDFS using below command

$HADOOP_HOME/bin/hdfs namenode -format

img

Step 5) Start Hadoop single node cluster using below command

$HADOOP_HOME/sbin/start-dfs.sh

An output of above command

img

$HADOOP_HOME/sbin/start-yarn.sh

img

Using ‘jps’ tool/command, verify whether all the Hadoop related processes are running or not.

img

If Hadoop has started successfully then an output of jps should show NameNode, NodeManager, ResourceManager, SecondaryNameNode, DataNode.

Step 6) Stopping Hadoop

$HADOOP_HOME/sbin/stop-dfs.sh

img

$HADOOP_HOME/sbin/stop-yarn.sh

img

04 HDFS Intro

04 HDFS Introduction