Apache Hadoop is absolutely free and open-source software module for remote-access data processing of huge Data sets across clusters. In other word, Hadoop is a set of rule (frameworks) that permit storing huge amounts of data (large data or Big Data), and processing it in a much more businesslike and faster way (via distributed processing). It is created to scale up from individual servers to thousands of machines, each offering local process and storage. It is automatically detect and handle failures at the application layer, which is provide highly-available service in cluster environment, So essentially, the center part of Apache Hadoop constitute two material possession: one is storage part (Hadoop Distributed File System or HDFS) which store all the data on it. And second one is processing part (MapReduce). Its Hadoop Distributed File System (HDFS) cleaves (splits) files into large blocks ( by default 64MB or 128MB) and spread out the blocks among the nodes in the cluster.
Install Prerequisites
Hadoop basically written in Java, therefor Java should be installed on your system, if java already installed your system then ignore or skip the below if not then follow the below steps.
[root@linuxpcfix apache_hadoop]# tar -xvf jdk-8u31-linux-x64.tar.gz
[root@linuxpcfix apache_hadoop]# ln -s jdk1.8.0_31/ java
Once java installation get completed vefiy the java version as given below.
java version “1.8.0_31”
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
SSH configuration and user creation.
Now you need to enable password-less login, so generate SSH key without passpharse.
[root@linuxpcfix ~]# su – hadoop
[[root@linuxpcfix ~]# su – hadoop
[hadoop@linuxpcfix ~]$ ssh-keygen -t rsa
[hadoop@linuxpcfix ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@linuxpcfix ~]$
now you will be able to login without password you can verify the same using following command.
Download and install apache hadoop.
To Download apache hadoop 2.6.0 tarball from source code by using below command from download source url .
[root@linuxpcfix ~]# cd /usr/local/apache_hadoop
[root@linuxpcfix ~]# wget http://apache.bytenet.in/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
–2015-02-03 04:53:24– http://apache.bytenet.in/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Resolving apache.bytenet.in… 2400:4a80::e, 150.107.171.242
Connecting to apache.bytenet.in|2400:4a80::e|:80… connected.
Length: 195257604 (186M) [application/x-gzip]
Saving to: “hadoop-2.6.0.tar.gz”
100% [=========================================================> ] 19,531,992 273K/s eta 8m 36s
[root@linuxpcfix apache_hadoop]# tar -xvf hadoop-2.6.0.tar.gz
[root@linuxpcfix apache_hadoop]# ln -s hadoop-2.6.0 hadoop
[root@linuxpcfix ~]# chown -R hadoop.hadoop /usr/local/apache_hadoop/
or
[root@linuxpcfix ~]# chown -R hadoop.hadoop /usr/local/apache_hadoop/
you need to confirm the java installation path that where is java located on your system.
[root@linuxpcfix ~]# export HADOOP_INSTALL=/usr/local/apache_hadoop/hadoop
or
open the .bashrc file and append the following lines.
#HADOOP VARIABLES START
export JAVA_HOME=/usr/local/java
export HADOOP_INSTALL=/usr/local/apache_hadoop/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_INSTALL/lib”
#HADOOP VARIABLES END
after saving the file execute the below command to update the new created environment
check and verify now which version of hadoop running on the server:
[root@linuxpcfix bin]# ./hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Edit and configure hadoop configuration file.
Apache Hadoop have several configuration files so you have to configure files according to your requirement, Here we are going to configure Hadoop with basic single cluster node as given below.
Modify yarn-site.xml using vi editor
[hadoop@linuxpcfix hadoop]$ vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
open the file core-site.xml and append the following lines
[hadoop@linuxpcfix hadoop]$ vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
modify mapred-site.xml using your favorite editor
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
open the file hdfs-site.xml and append the following lines.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/apache_hadoop/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/apache_hadoop/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
once file configuration get completed then format the file syste as below.
STARTUP_MSG: host = host.linuxpcfix.com/192.167.333.222
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/apache_hadoop/hadoop-2.6.0/etc/hadoop:/usr/local/apache_hadoop/hadoop-2.6.0/share/hadoop/common/………………………
15/02/03 06:35:10 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/02/03 06:35:10 INFO util.ExitUtil: Exiting with status 0
15/02/03 06:35:10 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
Shutting down NameNode at host.linuxpcfix.com/192.167.333.222
************************************************************/
now start clusters
15/02/03 06:47:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/hadoop-hadoop-namenode-linuxpcfix.com.out
localhost: starting datanode, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/hadoop-hadoop-datanode-linuxpcfix.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t be established.
RSA key fingerprint is 32:7b:68:a8:fc:d0:a3:ac:e3:a0:d5:49:fe:51:fd:4e.
Are you sure you want to continue connecting (yes/no)? Yes
starting yarn daemons
starting resourcemanager, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/yarn-hadoop-resourcemanager-linuxpcfix.com.out
localhost: starting nodemanager, logging to /usr/local/apache_hadoop/hadoop-2.6.0/logs/yarn-hadoop-nodemanager-linuxpcfix.com.out
now open your favourite webbrowser and put url as http://linuxpcfix.com:50070/dfshealth.html#tab-overview and press enter.
Hadoop setup testing.
Finally test the hadoop setup it working properly or not by performing following commands.
[hadoop@linuxpcfix hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
And copy all files from local /var/log/varnish to hadoop distributed file system as given below
You can now browser the files by using url as http://linuxpcfix.com:50070/explorer.html#/
If you want to copy logs from hadoop to local system then execute the following command.
[hadoop@linuxpcfix hadoop]$ ls -l /tmp/logs/
Hi,
Thanks for the nice article. I have followed your steps to install hadoop 2.6 with java 1.8.0.56 on CentOS 7. I am facing this issue when I ran {code}hadoop version{code}.
{code}
Error: Could not find or load main class ”-Djava.library.path=.usr.local.apache_hadoop.hadoop.lib”
{code}
Do I need to check anything extra ?
-Thanks
Srinivas
seems to there is issue with configuration.