Difference between revisions of "Hadoop"

From Cncz
Jump to: navigation, search
(Apache Hadoop links)
Line 1: Line 1:
===Apache Hadoop links===
+
==Running Hadoop in Terminal Rooms==
* [http://hadoop.apache.org/docs/stable/single_node_setup.html Single node setup]
+
To setup a hadoop hadoop cluster in a terminal room, make sure you have booted some PC's with Ubuntu Linux 12.04. Write down the names of the PC's you want to use as slave nodes.
* [http://hadoop.apache.org/docs/stable/cluster_setup.html Cluster setup]
+
Then run tkhadoop.sh:
  
===Setup terminal rooms===
+
/usr/local/bin/tkhadoop.sh [slaves]
An Ubuntu package for hadoop (downloaded from [http://ftp.nluug.nl/internet/apache/hadoop/common/ ftp.nluug.nl]) has been added to the science ubuntu repository.
 
hadoop_1.1.1-1_x86_64.deb
 
  
Local users
+
For example, when you're physically working with hg137pc01 and pc two and three are available and running linux, use:
uid: 201 for hdfs
 
uid: 202 for mapred
 
gid:  49 for hadoop
 
  
In /etc/hadoop/hadoop-env.sh, the HADOOP_CLIENT_OPTS environment variable has been changed from -Xmx128m to -Xmx1024m.
+
tkhadoop.sh hg137pc02.science.ru.nl hg137pc03.science.ru.nl
  
===Stand-alone test===
+
 
With this setup, we could successfully run the example job:
+
This will setup the files needed to run a three node hadoop cluster. The host on which you execute tkhadoop will be the master node and will be used as slave as well.
  $ cd /scratch/
+
You'll find your hadoop installation in:
$ mkdir input
+
  /scratch/username/hadoop
$ cp /usr/share/hadoop/templates/conf/*.xml input # heeft niks met configuratie te maken, dit is het genereren van input data
+
 
$ hadoop jar /usr/share/hadoop/hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
+
From within this directory, you can test the examples as documented on the [http://hadoop.apache.org/docs/stable/single_node_setup.html apache hadoop website]:
$ cat output/*
 

Revision as of 11:21, 15 April 2013

Running Hadoop in Terminal Rooms

To setup a hadoop hadoop cluster in a terminal room, make sure you have booted some PC's with Ubuntu Linux 12.04. Write down the names of the PC's you want to use as slave nodes. Then run tkhadoop.sh:

/usr/local/bin/tkhadoop.sh [slaves]

For example, when you're physically working with hg137pc01 and pc two and three are available and running linux, use:

tkhadoop.sh hg137pc02.science.ru.nl hg137pc03.science.ru.nl


This will setup the files needed to run a three node hadoop cluster. The host on which you execute tkhadoop will be the master node and will be used as slave as well. You'll find your hadoop installation in:

/scratch/username/hadoop

From within this directory, you can test the examples as documented on the apache hadoop website: