Difference between revisions of "Hadoop"

From Cncz
Jump to: navigation, search
Line 7: Line 7:
 
For example, when you're physically working with hg137pc01 and pc two and three are available and running linux, use:
 
For example, when you're physically working with hg137pc01 and pc two and three are available and running linux, use:
  
  tkhadoop.sh hg137pc02.science.ru.nl hg137pc03.science.ru.nl
+
  tkhadoop.sh hg137pc02.science.ru.nl hg137pc03.science.ru.nl*
  
 +
*Be sure to use fully qualified domain names slave host names. This is required for kerberos based ssh authentication, used in this our script and the scripts that are bundled with hadoop.
  
This will setup the files needed to run a three node hadoop cluster. The host on which you execute tkhadoop will be the master node and will be used as slave as well.
+
This will setup the files needed to run a three nodes hadoop cluster. The host on which you execute tkhadoop will be the master node and will be used as slave as well.
 
You'll find your hadoop installation in:
 
You'll find your hadoop installation in:
 
  /scratch/username/hadoop
 
  /scratch/username/hadoop
  
 
From within this directory, you can test the examples as documented on the [http://hadoop.apache.org/docs/stable/single_node_setup.html apache hadoop website]:
 
From within this directory, you can test the examples as documented on the [http://hadoop.apache.org/docs/stable/single_node_setup.html apache hadoop website]:
 +
$ cd /scratch/$USER/hadoop
 +
 +
Format a new distributed-filesystem:
 +
$ bin/hadoop namenode -format
 +
 +
Start the hadoop daemons:
 +
# $ bin/start-all.sh
 +
 +
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
 +
NameNode  - http://localhost:50070/
 +
JobTracker - http://localhost:50030/
 +
 +
Copy the input files into the distributed filesystem:
 +
$ bin/hadoop fs -put conf input
 +
 +
Run some of the examples provided:
 +
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
 +
 +
Examine the output files:
 +
Copy the output files from the distributed filesystem to the local filesytem and examine them:
 +
$ bin/hadoop fs -get output output
 +
$ cat output/*
 +
 +
or
 +
 +
View the output files on the distributed filesystem:
 +
$ bin/hadoop fs -cat output/*
 +
 +
When you're done, stop the daemons with:
 +
$ bin/stop-all.sh
 +
 +
Make sure to cleanup files in /scratch/$USER on the master and slave nodes.

Revision as of 14:04, 15 April 2013

Running Hadoop in Terminal Rooms

To setup a hadoop hadoop cluster in a terminal room, make sure you have booted some PC's with Ubuntu Linux 12.04. Write down the names of the PC's you want to use as slave nodes. Then run tkhadoop.sh:

/usr/local/bin/tkhadoop.sh [slaves]

For example, when you're physically working with hg137pc01 and pc two and three are available and running linux, use:

tkhadoop.sh hg137pc02.science.ru.nl hg137pc03.science.ru.nl*
  • Be sure to use fully qualified domain names slave host names. This is required for kerberos based ssh authentication, used in this our script and the scripts that are bundled with hadoop.

This will setup the files needed to run a three nodes hadoop cluster. The host on which you execute tkhadoop will be the master node and will be used as slave as well. You'll find your hadoop installation in:

/scratch/username/hadoop

From within this directory, you can test the examples as documented on the apache hadoop website:

$ cd /scratch/$USER/hadoop

Format a new distributed-filesystem:

$ bin/hadoop namenode -format

Start the hadoop daemons:

# $ bin/start-all.sh

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode   - http://localhost:50070/
JobTracker - http://localhost:50030/

Copy the input files into the distributed filesystem:

$ bin/hadoop fs -put conf input

Run some of the examples provided:

$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files: Copy the output files from the distributed filesystem to the local filesytem and examine them:

$ bin/hadoop fs -get output output 
$ cat output/*

or

View the output files on the distributed filesystem:

$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:

$ bin/stop-all.sh

Make sure to cleanup files in /scratch/$USER on the master and slave nodes.