Tutorial: To run the MapReduce EEMD code with Hadoop on

Download Report

Transcript Tutorial: To run the MapReduce EEMD code with Hadoop on

Tutorial: To run the MapReduce EEMD
code with Hadoop on Futuregrid
-by Rewati Ovalekar
●
Step 1:
–
Code is available on: http://code.google.com/p/cyberaide/
–
Download the code from:
http://code.google.com/p/cyberaide/source/browse/#svn
%2Ftrunk%2Fproject%2Fspring2011%2FEEMDAnalysi
s%2FEEMDJava
2
●
Step 2:
–
Create a futuregrid account
–
For further details refer:
https://portal.futuregrid.org/tutorials (FutureGrid
Tutorial)
3
●
Step 3:
–
Login to Futuregrid
–
ssh [email protected]
–
Following message will be displayed for successful login
4
●
Step 4:
–
●
Create a jar file
Step 5:
–
To transfer the jar file and the input file:
–
sftp [email protected]
–
put /../filepath
5
●
●
Step 6:
–
In order to run Hadoop on FutureGrid create an
eucalyptus account
–
For further details refer:
https://portal.futuregrid.org/tutorials/eucalyptus
Step 7:
–
Once the account is approved, load the eucalyptus tools :
Module load euca2ools
6
●
Step 8:
–
Make sure that the jar file and the input file are in the
same directory as the username.private key
–
Run the image which has hadoop on it:
euca-run-instances -k rovaleka -t c1.xlarge emi-D778156D
-k indicates the key name
-t indicates the type of instance
emi-D778156D indicates the image name
-n indicates the number of clusters to run
7
●
Step 8:
–
Check the status using:
–
euca-describe-instances
–
Keep checking till the status is running, once the status is
running one can login to run the Hadoop. It will be
displayed as below:
8
●
Step 9:
–
Transfer the input file and the jar file to the required VM
using:
scp –i username.private filename [email protected]:/
(Make sure that the address is same as the address assigned
to you else it will ask for password)
–
Login using:
scp –i username.private [email protected] (Make sure
the address is same)
9
SINGLE NODE
●
Step 10:
–
Above message will be displayed for successful login
–
Retrieve the transferred files and transfer it in the
Hadoop folder:
cd /..
mv filename /opt/hadoop-0.20.2
10
●
Step 11:
–
To run Hadoop:
cd /opt/hadoop-0.20.2
bin/start-all.sh
–
To check if everything is started:
jps
11
●
Step 12:
–
Transfer the input file on the HDFS:
bin/hadoop dfs –copyFromLocal inputfile name_in_HDFS
–
To check if it is present on HDFS:
bin/hadoop dfs –ls
NOTE: We need to transfer the input file whenever we
start Hadoop
12
●
Step 13:
–
To run the code:
bin/hadoop jar [jarFile] EEMDHadoop [inputfilename]
[required_output_file]
13
●
Step 14:
–
Retrieve the output :
bin/hadoop dfs -copyToLocal [outputFileName]
[outputfileNameToBeGiven]
(output will be avaliable in part-00000 file)
To check the logs and to debug the code go to folder
logs/userlogs
14
●
Step 15:
–
Stop the Hadoop:
bin/stop-all.sh
exit
15
Thank you!!!
16