How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 2
In Part-1 we have successfully created, launched and connected to Amazon Ubuntu Instances. In Part-2 I will show how to install and setup Hadoop cluster. If you are seeing this page first time, I would strongly advise you to go over Part-1.
In this article
- HadoopNameNode will be referred as master,
- HadoopSecondaryNameNode will be referred as SecondaryNameNode or SNN
- HadoopSlave1 and HadoopSlave2 will be referred as slaves (where data nodes will reside)
So, let’s begin.
1. Apache Hadoop Installation and Cluster Setup
1.1 Update the packages and dependencies.
Let’s update the packages , I will start with master , repeat this for SNN and 2 slaves.
$ sudo apt-get update
Once its complete, let’s install java
1.2 Install Java
Add following PPA and install the latest Oracle Java (JDK) 7 in Ubuntu
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update && sudo apt-get install oracle-jdk7-installer
Check if Ubuntu uses JDK 7
Repeat this for SNN and 2 slaves.
1.3 Download Hadoop
issue wget command from shell
Unzip the files and review the package content and configuration files.
$ tar -xzvf hadoop-1.2.1.tar.gz
For simplicity, rename the ‘hadoop-1.2.1’ directory to ‘hadoop’ for ease of operation and maintenance.
$ mv hadoop-1.2.1 hadoop
1.4 Setup Environment Variable
Setup Environment Variable for ‘ubuntu’ user
Update the .bashrc file to add important Hadoop paths and directories.
Navigate to home directory
Open .bashrc file in vi edito
$ vi .bashrc
Add following at the end of file
# Add Hadoop bin/ directory to path
Save and Exit.
To check whether its been updated correctly or not, reload bash profile, use following commands
1.5 Setup Password-less SSH on Servers
- ‘ssh-agent’ is a background program that handles passwords for SSH private keys.
- ‘ssh-add’ command prompts the user for a private key password and adds it to the list maintained by ssh-agent. Once you add a password to ssh-agent, you will not be asked to provide the key when using SSH or SCP to connect to hosts with your public key.
Amazon EC2 Instance has already taken care of ‘authorized_keys’ on master server, execute following commands to allow password-less SSH access to slave servers.
First of all we need to protect our keypair files, if the file permissions are too open (see below) you will get an error
To fix this problem, we need to issue following commands
$ chmod 644 authorized_keys
Quick Tip: If you set the permissions to ‘chmod 644′, you get a file that can be written by you, but can only be read by the rest of the world.
$ chmod 400 haddoec2cluster.pem
Quick Tip: chmod 400 is a very restrictive setting giving only the file onwer read-only access. No write / execute capabilities for the owner, and no permissions what-so-ever for anyone else.
ssh-add, follow the steps below:
- At the Unix prompt, enter: eval `ssh-agent`Note: Make sure you use the backquote (
`), located under the tilde (
~), rather than the single quote (
- Enter the command: ssh-add hadoopec2cluster.pem
if you notice .pem file has “read-only” permission now and this time it works for us.
Let’s verify that we can connect into SNN and slave nodes from master
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)