Home » Centos/RHEL » Apache » Install Hadoop (single node) on CentOS/Rhel

Install Hadoop (single node) on CentOS/Rhel

hadoop
Apache Hadoop is absolutely free and open-source software module for remote-access data processing of huge Data sets across clusters. In other word, Hadoop is a set of rule (frameworks) that permit storing huge amounts of data (large data or Big Data), and processing it in a much more businesslike and faster way (via distributed processing). It is created to scale up from individual servers to thousands of machines, each offering local process and storage. It is automatically detect and handle failures at the application layer, which is provide highly-available service in cluster environment, So essentially, the center part of Apache Hadoop constitute two material possession: one is storage part (Hadoop Distributed File System or HDFS) which store all the data on it. And second one is processing part (MapReduce). Its Hadoop Distributed File System (HDFS) cleaves (splits) files into large blocks ( by default 64MB or 128MB) and spread out the blocks among the nodes in the cluster.

Install Prerequisites

Hadoop basically written in Java, therefor Java should be installed on your system, if java already installed your system then ignore or skip the below if not then follow the below steps.

[hadoop@linuxpcfix hadoop]$ cd /usr/local/apache_hadoop
[root@linuxpcfix apache_hadoop]# tar -xvf jdk-8u31-linux-x64.tar.gz
[root@linuxpcfix apache_hadoop]# ln -s jdk1.8.0_31/ java

Once java installation get completed vefiy the java version as given below.

[root@linuxpcfix apache_hadoop]# ./java/bin/java -version
java version “1.8.0_31”
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

SSH configuration and user creation.
Now you need to enable password-less login, so generate SSH key without passpharse.

[root@linuxpcfix ~]# useradd hadoop
[root@linuxpcfix ~]# su – hadoop
[[root@linuxpcfix ~]# su – hadoop
[hadoop@linuxpcfix ~]$ ssh-keygen -t rsa
[hadoop@linuxpcfix ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@linuxpcfix ~]$

now you will be able to login without password you can verify the same using following command.

[hadoop@linuxpcfix ~]$ ssh localhost

Download and install apache hadoop.
To Download apache hadoop 2.6.0 tarball from source code by using below command from download source url .

[root@linuxpcfix ~]# mkdir /usr/local/apache_hadoop
[root@linuxpcfix ~]# cd /usr/local/apache_hadoop
[root@linuxpcfix ~]# wget http://apache.bytenet.in/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
–2015-02-03 04:53:24– http://apache.bytenet.in/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Resolving apache.bytenet.in… 2400:4a80::e, 150.107.171.242
Connecting to apache.bytenet.in|2400:4a80::e|:80… connected.
Length: 195257604 (186M) [application/x-gzip]
Saving to: “hadoop-2.6.0.tar.gz”
100% [=========================================================> ] 19,531,992 273K/s eta 8m 36s
[root@linuxpcfix apache_hadoop]# tar -xvf hadoop-2.6.0.tar.gz
[root@linuxpcfix apache_hadoop]# ln -s hadoop-2.6.0 hadoop
[root@linuxpcfix ~]# chown -R hadoop.hadoop /usr/local/apache_hadoop/

or

[root@linuxpcfix ~]# mv hadoop-2.6.0 /usr/local/apache_hadoop/hadoop
[root@linuxpcfix ~]# chown -R hadoop.hadoop /usr/local/apache_hadoop/

you need to confirm the java installation path that where is java located on your system.

[root@linuxpcfix ~]# export JAVA_HOME=”/usr/local/apache_hadoop/java”
[root@linuxpcfix ~]# export HADOOP_INSTALL=/usr/local/apache_hadoop/hadoop

or
open the .bashrc file and append the following lines.

[hadoop@linuxpcfix ~]# vi ~/.bashrc
#HADOOP VARIABLES START
export JAVA_HOME=/usr/local/java
export HADOOP_INSTALL=/usr/local/apache_hadoop/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_INSTALL/lib”
#HADOOP VARIABLES END

after saving the file execute the below command to update the new created environment

[hadoop@linuxpcfix ~]source ~/.bashrc

check and verify now which version of hadoop running on the server:

[root@linuxpcfix ]#cd /usr/local/apache_hadoop/hadoop/bin
[root@linuxpcfix bin]# ./hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1

Edit and configure hadoop configuration file.
Apache Hadoop have several configuration files so you have to configure files according to your requirement, Here we are going to configure Hadoop with basic single cluster node as given below.

[hadoop@linuxpcfix hadoop]$ cd /usr/local/apache_hadoop/hadoop/etc/hadoop

Modify yarn-site.xml using vi editor

[hadoop@linuxpcfix hadoop]$ vi yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

open the file core-site.xml and append the following lines

[hadoop@linuxpcfix hadoop]$ vi core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

</configuration>

modify mapred-site.xml using your favorite editor

[hadoop@linuxpcfixhadoop]$ vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

open the file hdfs-site.xml and append the following lines.

[hadoop@linuxpcfix hadoop]$ vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/apache_hadoop/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/apache_hadoop/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

once file configuration get completed then format the file syste as below.

root@linuxpcfix bin]# ./hdfs namenode -format
STARTUP_MSG: host = host.linuxpcfix.com/192.167.333.222
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/apache_hadoop/hadoop-2.6.0/etc/hadoop:/usr/local/apache_hadoop/hadoop-2.6.0/share/hadoop/common/………………………
15/02/03 06:35:10 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/02/03 06:35:10 INFO util.ExitUtil: Exiting with status 0
15/02/03 06:35:10 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
Shutting down NameNode at host.linuxpcfix.com/192.167.333.222
************************************************************/

now start clusters

[hadoop@linuxpcfix sbin]$ start-dfs.sh
15/02/03 06:47:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/hadoop-hadoop-namenode-linuxpcfix.com.out
localhost: starting datanode, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/hadoop-hadoop-datanode-linuxpcfix.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t be established.
RSA key fingerprint is 32:7b:68:a8:fc:d0:a3:ac:e3:a0:d5:49:fe:51:fd:4e.
Are you sure you want to continue connecting (yes/no)? Yes
[hadoop@linuxpcfix sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/yarn-hadoop-resourcemanager-linuxpcfix.com.out
localhost: starting nodemanager, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/yarn-hadoop-nodemanager-linuxpcfix.com.out

now open your favourite webbrowser and put url as http://linuxpcfix.com:50070/dfshealth.html#tab-overview and press enter.
Selection_036

 

Selection_039 Selection_038 Selection_037
Hadoop setup testing.
Finally test the hadoop setup it working properly or not by performing following commands.

[hadoop@linuxpcfix hadoop]$ bin/hdfs dfs -mkdir /user/
[hadoop@linuxpcfix hadoop]$ bin/hdfs dfs -mkdir /user/hadoop

And copy all files from local /var/log/varnish to hadoop distributed file system as given below

[hadoop@linuxpcfix hadoop]$ bin/hdfs dfs -put /var/log/varnish logs

You can now browser the files by using url as http://linuxpcfix.com:50070/explorer.html#/
Selection_042

Selection_043

If you want to copy logs from hadoop to local system then execute the following command.

[hadoop@linuxpcfix hadoop]$ bin/hdfs dfs -get logs /tmp/logs
[hadoop@linuxpcfix hadoop]$ ls -l /tmp/logs/

About

I am founder and webmaster of www.linuxpcfix.com and working as a Sr. Linux Administrator (Expertise on Linux/Unix & Cloud Server) and have been in the industry since more than 14 years.

3 thoughts on “Install Hadoop (single node) on CentOS/Rhel

  1. Srinivas says:

    Hi,

    Thanks for the nice article. I have followed your steps to install hadoop 2.6 with java 1.8.0.56 on CentOS 7. I am facing this issue when I ran {code}hadoop version{code}.

    {code}
    Error: Could not find or load main class ”-Djava.library.path=.usr.local.apache_hadoop.hadoop.lib”
    {code}

    Do I need to check anything extra ?

    -Thanks
    Srinivas

    1. Gsingh says:

      seems to there is issue with configuration.

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

Time limit is exhausted. Please reload the CAPTCHA.

Categorized Tag Cloud