Category Archives: hadoop

Hadoop 2 single node Installation on Linux

Hadoop 2 single node  Installation on Linux

This blog gives you the steps to install Hadoop on Linux (Ubuntu)

Click here for video

Step 

1. Set the Java home in environment file(.bashrc file)

2.Create a ssh key gen for password less communication

3.Start your hadoop installation

4.Extract the hadoop tar file and open all the configuration file (etc/hadoop/)

 

core-site.xml

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50000</value>
</property>
————————————
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
————————————
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/big/hadoop2-dir/namenode-dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/big/hadoop2-dir/datanode-dir</value>
</property>
————————————
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
————————————
hadoop-env.sh
export JAVA_HOME=/home/big/jdk1.7.0_45
————————————
yarn-env.sh
export JAVA_HOME=/home/big/jdk1.7.0_45
————————————
mapred-env.sh
export JAVA_HOME=/home/big/jdk1.7.0_45

————————————-

Format the hadoop name node

bin/hadoop namenode -format

Start hadoop

sbin/start-all.sh

—————————————

Open your browser and type

localhost:50070 (Name node web UI)

Hadoop 2 Installation on Windows

Hadoop 2 Installation on Windows  without “cygwin”

Hadoop2 recent release supports windows installation it includes all cmd file for configurations steps to follow

Click here for VIDEO

Step 1 : You need hadoop win utils Click here to download

Step 2: Download Java 1.7 for windows and set the Path in Environment .

Continue reading Hadoop 2 Installation on Windows

Apache Spark and Hadoop Integration

Apache Spark and Hadoop Integration with example

Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop‘s two-stage disk-based MapReduce paradigm, Spark’s in-memory primitives provide performance up to 100 times faster for certain applications.
Step 1 : Install hadoop in your machine  ( 1x or 2x) and also you need to set java path and scala path in .bashrc ( for setting path refer this post Spark installation )

Continue reading Apache Spark and Hadoop Integration