Software cluster

From Cncz
Revision as of 15:47, 31 October 2011 by Petervc (talk | contribs) (Cluster software)
Jump to: navigation, search

Cluster software

On the cnXX-cluster the Oracle Grid Engine cluster software has been installed.

Usage:


  • You can only submit shell-scripts, as an example here the shell-script '~/date':
    #! /bin/sh
    /bin/date

To submit this script just enter: qsub -cwd ~/date. The output and error will be written to files ~/date.[oe]$jobnumber. Because of the -cwd they can be found in the current directory, not in the home directory.

If you want this job to run on a special host (here as an example 'cn00', you can use: qsub -q '*@cn00' ~/date.

We have configured hostgroups:

qconf -shgrpl

shows which hostgroups exist,

qconf -shgrp <hostgroep>

shows which subhostgroups or hosts are in that hostgroup. So you can use:

   qsub -q '*@@mlfhosts,*@@tcmhosts' ~/date

If it is not a 'hard' requirement, but only a 'soft' preference to run on a certain hostgroup:

qsub -soft -q '*@@mlfhosts' ~/date

A nice option of 'qsub' is: -p priority. Available for qsub, qsh, qrsh, qlogin and qalter only. Defines or redefines the priority of the job relative to other jobs. The priority is normally only important for Grid Engine when deciding which job to start. Grid Engine normally does not mess with running jobs. Priority is an integer in the range -1023 to 1024. The default priority value for jobs is 0. Users may only decrease the priority of their jobs. Grid Engine managers and administrators may also increase the priority associated with jobs. If a pending job has higher priority, it is earlier eligible for being dispatched by the Grid Engine scheduler.

Of course also the 'nice' command can be useful when using other people's hosts:

	nice - run a program with modified scheduling priority
	       -n, --adjustment=N
	       add integer N to the niceness (default 10)
  • For an MPI job:

First create a 'smpd passphrase" file:

	touch ~/.smpd
	chmod 600 ~/.smpd
	echo "phrase=MyOwnPassword" > ~/.smpd

Please pick your own MyOwnPassword !!! This password is only used for MPI, don't use your login password!

~/mpich2.sh:

	#!/bin/sh -x
	#
	#$ -S /bin/sh
	#
	# sample mpich2 job
	# you will need to adjust the $PATH to your mpich2 installation
	# be sure to get the correct mpiexec for mpich2_smpd!!!
	export PATH=/usr/local/mpich2_smpd/bin:$PATH
	port=$((JOB_ID % 5000 + 20000))
	echo "Got $NSLOTS slots."
	mpiexec -n $NSLOTS -machinefile $TMPDIR/machines -port $port ~/mpihello
	exit 0

which runs the compiled source of ~/mpihello.c:

	#include <stdio.h>
	#include "mpi.h"
	main(int argc, char** argv)
	{
	 int noprocs, nid;
	 MPI_Init(&argc, &argv);
	 MPI_Comm_size(MPI_COMM_WORLD, &noprocs);
	 MPI_Comm_rank(MPI_COMM_WORLD, &nid);
	 if (nid == 0)
	  printf("Hello world! I'm node %i of %i \n", nid, noprocs);
	 MPI_Finalize();
	}

that has been compiled with: /usr/local/mpich2_smpd/bin/mpicc mpihello.c -lmpich -o mpihello

One needs to choose the parallel environment 'mpich2_smpd' and the number of slots: qsub -pe mpich2_smpd 2 ~/mpich2.sh

  • Other interesting commands:
qstat - show the status of Grid Engine jobs and queues
qmod - modify a Grid Engine queue
-cj    Clears the error state of the specified jobs(s).
-cq    Clears the error state of the specified queue(s).

If a job fails and puts the queue on a certain machine in error (E) state, a system administrator can be reached at , to clear this error state by entering something like: qmod -c all.q@cn16