Difference between revisions of "Software cluster"

From Cncz
Jump to: navigation, search
 
m (Cluster software)
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
== Cluster software ==
 
== Cluster software ==
  
[nl]
+
<font color=red>
Op het cnXX-cluster is als clustersoftware [http://gridengine.sunsource.net Sun GridEngine] geinstalleerd.
+
!! All clusternodes will be moved to [[Slurm|SLURM]]. The text below deals with the older GridEngine clustersoftware. !!
[/nl]
+
</font>
[en]
 
On the cnXX-cluster the [http://gridengine.sunsource.net Sun Grid Engine] clustersoftware has been installed.  
 
[/en]
 
  
[Gebruik][Usage]:
+
== [Oude Clustersoftware][Previous Cluster software] ==
  
 
[nl]
 
[nl]
* Zorg dat je 'ssh' kunt doen naar alle clusterhosts zonder wachtwoord. Dit omvat:
+
Op het cnXX-cluster is als clustersoftware [http://www.oracle.com/technetwork/oem/grid-engine-166852.html Oracle GridEngine] ge&iuml;nstalleerd.
** Maak ".ssh/authorized_keys" door:
 
*** ssh-keygen (met een lege passphrase)
 
*** cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
 
*** Voeg de hostkeys van alle clusterhosts toe aan ~/.ssh/known_hosts, bv door te ssh-en naar alle hosts.
 
 
[/nl]
 
[/nl]
 
[en]
 
[en]
* Make sure you can 'ssh' to all cluster machines without password. This involves:
+
On the cnXX-cluster the [http://www.oracle.com/technetwork/oem/grid-engine-166852.html Oracle Grid Engine] cluster software has been installed.  
** Creating ".ssh/authorized_keys" by:
 
*** ssh-keygen (with empty passphrase)
 
*** cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
 
*** Adding hostkeys for all clusterhosts to ~/.ssh/known_hosts, e.g. by ssh-ing to all hosts.
 
 
[/en]
 
[/en]
  
 +
[Gebruik][Usage]:
  
* [Importeer][Import] GridEngine settings [in je][into your] shell:
 
** ([Voor][For] csh/tcsh)
 
<pre>
 
    source /vol/GridEngine/default/common/settings.csh
 
</pre>
 
** ([Voor][For] sh/bash)
 
<pre>
 
    . /vol/GridEngine/default/common/settings.sh
 
</pre>
 
[Dit zet de volgende ][This will set or expand the following] environment variables:
 
<pre>
 
  - $SGE_ROOT        (always necessary)
 
  - $SGE_CELL        (if you are using a cell other than >default<)
 
  - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
 
  - $SGE_EXECD_PORT  (if you haven't added the service >sge_execd<)
 
  - $PATH/$path      (to find the Grid Engine binaries)
 
  - $MANPATH          (to access the manual pages)
 
</pre>
 
  
* [Voor het submitten van het shell-script][To submit the shell-script] '~/date':
+
* [Je mag alleen maar een shell-script submitten via qsub, hier als voorbeeld het shell-script][You can only submit shell-scripts, as an example here the shell-script] '~/date':
 
<pre>
 
<pre>
 
     #! /bin/sh
 
     #! /bin/sh
 
     /bin/date
 
     /bin/date
 
</pre>
 
</pre>
[tik je][just enter]: <tt>qsub  ~/date</tt>.
+
[Om dit script te submitten tik je][To submit this script just enter]: <tt>qsub  -cwd ~/date</tt>.
[De output en error wordt geschreven naar bestanden][The output and error will be written to files] <tt>~/date.[oe]$jobnumber</tt>.
+
[De output en error wordt geschreven naar bestanden][The output and error will be written to files] <tt>~/date.[oe]$jobnumber</tt>. [Vanwege de -cwd staan die niet in de homedirectory, maar in de huidige directory][Because of the -cwd they can be found in the current directory, not in the home directory].
  
[Als je een job op een bepaalde host wil laten lopen, gebruik][If you want your job to run on a special host, you can use]: <tt>qsub -q '*@cn00' ~/date</tt>.
+
[Als je deze job op een bepaalde host (hier als voorbeeld 'cn00' wil laten lopen, gebruik][If you want this job to run on a special host (here as an example 'cn00', you can use]: <tt>qsub -q '*@cn00' ~/date</tt>.
  
 
We [hebben hostgroups gemaakt][have configured hostgroups]:
 
We [hebben hostgroups gemaakt][have configured hostgroups]:
 
<pre>
 
<pre>
@allhosts (cn00 + all hostgroups below)
+
qconf -shgrpl
@mlfhosts cn16 cn17 cn18 cn19 cn26 cn27 cn28 cn29
 
@snnhosts cn01 cn02 cn03 cn04 cn05
 
@thchemhosts cn10 cn11 cn12 cn13 cn14 cn15
 
@kristalhosts cn20 cn21 cn22 cn23 cn24 cn25
 
@tcmhosts cn06 cn07 cn08 cn09 cn30 cn31 cn32 cn33 cn34 cn35 cn36 cn37 cn38
 
 
</pre>
 
</pre>
[dus je kunt gebruiken][so you can use]:
+
[laat zien welke er bestaan][shows which hostgroups exist],
 +
<pre>
 +
qconf -shgrp <hostgroep>
 +
</pre>
 +
[laat zien welke subhostgroep of hosts er in die hostgroep zitten.][shows which subhostgroups or hosts are in that hostgroup.]
 +
[Dus je kunt gebruiken][So you can use]:
 
<pre>  qsub -q '*@@mlfhosts,*@@tcmhosts' ~/date</pre>
 
<pre>  qsub -q '*@@mlfhosts,*@@tcmhosts' ~/date</pre>
 
[Wanneer het geen 'harde' eis is, maar alleen een 'soft' voorkeur om een bepaalde host te gebruiken][ If it is not a 'hard' requirement, but only a 'soft' preference to run on a certain hostgroup]:
 
[Wanneer het geen 'harde' eis is, maar alleen een 'soft' voorkeur om een bepaalde host te gebruiken][ If it is not a 'hard' requirement, but only a 'soft' preference to run on a certain hostgroup]:
Line 70: Line 42:
  
 
[nl]
 
[nl]
Een aardige optie van 'qsub' is: <tt>-p priority</tt>. Die optie is alleen aanwezig bij qsub, qsh, qrsh, qlogin en qalter. Het (her)definieert de prioriteit van een job ten opzichte van andere jobs.  
+
Een aardige optie van 'qsub' is: <tt>-p priority</tt>. Die optie is alleen aanwezig bij qsub, qsh, qrsh, qlogin en qalter. Het (her)definieert de prioriteit van een job ten opzichte van andere jobs. Dit gaat normaal alleen over de volgorde waarin Grid Engine jobs laat starten. Grid Engine rommelt normaal niet met lopende jobs. Priority is een geheel getal van -1023 tot en met 1024. Default is de priority 0. Gebruikers mogen alleen de prioriteit van hun jobs verlagen. Als een job hogere prioriteit heeft, dan kan die eerder gekozen worden door Grid Engine om te starten.  
 
[/nl]
 
[/nl]
 
[en]
 
[en]
 
A nice option of 'qsub' is: <tt>-p priority</tt>. Available for qsub, qsh, qrsh, qlogin and qalter only.
 
A nice option of 'qsub' is: <tt>-p priority</tt>. Available for qsub, qsh, qrsh, qlogin and qalter only.
Defines or redefines the priority of the job relative to other jobs.  Priority is  an
+
Defines or redefines the priority of the job relative to other jobs.  The priority is normally only important for Grid Engine when deciding which job to start. Grid Engine normally does not mess with running jobs. Priority is  an integer in the range -1023 to 1024.  The default priority value for jobs is 0. Users  may only decrease the priority of their jobs.  Grid Engine managers and administrators may also increase the priority associated with jobs. If a pending  job  has
integer in the range -1023 to 1024.  The default priority value for jobs is 0.
 
Users  may only decrease the priority of their jobs.  Grid Engine managers and admin-
 
istrators may also increase the priority associated with jobs. If a pending  job  has
 
 
higher  priority,  it  is  earlier  eligible  for being dispatched by the Grid Engine
 
higher  priority,  it  is  earlier  eligible  for being dispatched by the Grid Engine
 
scheduler.
 
scheduler.
Line 101: Line 70:
 
echo "phrase=MyOwnPassword" > ~/.smpd
 
echo "phrase=MyOwnPassword" > ~/.smpd
 
</pre>
 
</pre>
[Kies ''svp'' je eiegen][''Please'' pick your own] ''MyOwnPassword'' !!!
+
[Kies ''svp'' je eigen][''Please'' pick your own] ''MyOwnPassword'' !!!  
 +
[Dit wachtwoord wordt alleen voor MPI gebruikt, niet je loginwachtwoord!][This password is only used for MPI, don't use your login password!]
  
 
~/mpich2.sh:
 
~/mpich2.sh:
Line 148: Line 118:
 
-cq    Clears the error state of the specified queue(s).
 
-cq    Clears the error state of the specified queue(s).
 
</pre>
 
</pre>
 +
 +
[nl]
 +
Als een job faalt waardoor de queue op een bepaalde machine in error (E) komt, dan kan een systeembeheerder, te mailen via postmaster@science.ru.nl, dit oplossen door iets te tikken als:
 +
[/nl]
 +
[en]
 +
If a job fails and puts the queue on a certain machine in error (E) state, a system administrator can be reached  at postmaster@science.ru.nl, to clear this error state by entering something like:
 +
[/en]
 +
<tt>qmod -c all.q@cn16</tt>
 +
 +
== qmon - X-Windows OSF/Motif graphical user's interface for Grid Engine ==
 +
[nl]
 +
Dit grafische user-interface voor grid engine kan gestart worden op cn99 met:
 +
[/nl]
 +
[en]
 +
This graphical user's interface for Grid Engine can be started on cn99 with:
 
[/en]
 
[/en]
 +
<pre>
 +
ssh -X cn99 qmon
 +
</pre>
 +
[[Categorie: Software]]

Latest revision as of 15:11, 14 October 2015

Cluster software

!! All clusternodes will be moved to SLURM. The text below deals with the older GridEngine clustersoftware. !!

Previous Cluster software

On the cnXX-cluster the Oracle Grid Engine cluster software has been installed.

Usage:


  • You can only submit shell-scripts, as an example here the shell-script '~/date':
    #! /bin/sh
    /bin/date

To submit this script just enter: qsub -cwd ~/date. The output and error will be written to files ~/date.[oe]$jobnumber. Because of the -cwd they can be found in the current directory, not in the home directory.

If you want this job to run on a special host (here as an example 'cn00', you can use: qsub -q '*@cn00' ~/date.

We have configured hostgroups:

qconf -shgrpl

shows which hostgroups exist,

qconf -shgrp <hostgroep>

shows which subhostgroups or hosts are in that hostgroup. So you can use:

   qsub -q '*@@mlfhosts,*@@tcmhosts' ~/date

If it is not a 'hard' requirement, but only a 'soft' preference to run on a certain hostgroup:

qsub -soft -q '*@@mlfhosts' ~/date

A nice option of 'qsub' is: -p priority. Available for qsub, qsh, qrsh, qlogin and qalter only. Defines or redefines the priority of the job relative to other jobs. The priority is normally only important for Grid Engine when deciding which job to start. Grid Engine normally does not mess with running jobs. Priority is an integer in the range -1023 to 1024. The default priority value for jobs is 0. Users may only decrease the priority of their jobs. Grid Engine managers and administrators may also increase the priority associated with jobs. If a pending job has higher priority, it is earlier eligible for being dispatched by the Grid Engine scheduler.

Of course also the 'nice' command can be useful when using other people's hosts:

	nice - run a program with modified scheduling priority
	       -n, --adjustment=N
	       add integer N to the niceness (default 10)
  • For an MPI job:

First create a 'smpd passphrase" file:

	touch ~/.smpd
	chmod 600 ~/.smpd
	echo "phrase=MyOwnPassword" > ~/.smpd

Please pick your own MyOwnPassword !!! This password is only used for MPI, don't use your login password!

~/mpich2.sh:

	#!/bin/sh -x
	#
	#$ -S /bin/sh
	#
	# sample mpich2 job
	# you will need to adjust the $PATH to your mpich2 installation
	# be sure to get the correct mpiexec for mpich2_smpd!!!
	export PATH=/usr/local/mpich2_smpd/bin:$PATH
	port=$((JOB_ID % 5000 + 20000))
	echo "Got $NSLOTS slots."
	mpiexec -n $NSLOTS -machinefile $TMPDIR/machines -port $port ~/mpihello
	exit 0

which runs the compiled source of ~/mpihello.c:

	#include <stdio.h>
	#include "mpi.h"
	main(int argc, char** argv)
	{
	 int noprocs, nid;
	 MPI_Init(&argc, &argv);
	 MPI_Comm_size(MPI_COMM_WORLD, &noprocs);
	 MPI_Comm_rank(MPI_COMM_WORLD, &nid);
	 if (nid == 0)
	  printf("Hello world! I'm node %i of %i \n", nid, noprocs);
	 MPI_Finalize();
	}

that has been compiled with: /usr/local/mpich2_smpd/bin/mpicc mpihello.c -lmpich -o mpihello

One needs to choose the parallel environment 'mpich2_smpd' and the number of slots: qsub -pe mpich2_smpd 2 ~/mpich2.sh

  • Other interesting commands:
qstat - show the status of Grid Engine jobs and queues
qmod - modify a Grid Engine queue
-cj    Clears the error state of the specified jobs(s).
-cq    Clears the error state of the specified queue(s).

If a job fails and puts the queue on a certain machine in error (E) state, a system administrator can be reached at , to clear this error state by entering something like: qmod -c all.q@cn16

qmon - X-Windows OSF/Motif graphical user's interface for Grid Engine

This graphical user's interface for Grid Engine can be started on cn99 with:

ssh -X cn99 qmon