GQSUB

Grid computing at the meso scale


Table of Contents

1. Introduction, and indented audience
2. Installation
3. General configuration
4. gqsub
Example
Command line options
File staging and output
Specifiying file staging
Multi-job submission by array jobs
5. gqstat
Job output
6. Known issues
gqsub
Couldn't find a valid proxy certificate

List of Figures

4.1. gqsub command line options
4.2. Environment variables for array jobs in gqsub.

List of Tables

4.1.
4.2.

List of Examples

3.1. Sample .gqsubrc
4.1. Trivial submission script
4.2. Command line use of gqsub
4.3. An example of using explict file staging with gqsub
4.4. An example of using array jobs with gqsub
6.1. Couldn't find a valid proxy certificate

Chapter 1. Introduction, and indented audience

GQSUB was born out of work with a group of users with existing HPC experience. The most common question we were asked was, "Can we not just use qsub?". (Which wasn't the case, as that would have bound them to a single site...) However, the intent was clear: Most Grid tools don't work quite like anything else, and feel a lot more complicated. Parts of this is necessary complexity - using certificates to allow for decentralised authentication and authorisation, for example. Parts of this is unnecessary complexity, which could be removed.

By focusing in on a specific user community, it is possible to tailor the user interface to that community. The more focused community, the better the tailoring can be. By focusing on existing HPC/HTC users, who typically have their own cluster, gqsub can exploit existing knowledge. This reduces learning time, and improves usability. This user group was selected as a target due to it size (big), and current uptake of Grid computing (small) - essentially, it's logical place to expect Grid use to expand into.

The aim of gqsub is to allow a user of a local cluster to install a small bit of code on their existing head node, and then be able to submit the same job control scripts to either the local cluster or the grid, without modification.

In practice, the aim of running without modification is probably unrealistic. It can happen if there is a shared filesystem between the head node and the Grid system, but that's not a common method of operation (yet). Therefore there is an ability to make small changes dependant on if it's running locally or on the Grid. In trials, a couple of lines were all that needed to be adjusted - most of the configuration remains the same, as it is the same job; just a few small changes on

Chapter 2. Installation

Table of Contents

There are two supported methods of installation; via RPM or tarball

Installaion methods

Either method requires a gLite UI to be installed, in order to work. (Strictly, in either order; in practice, do the gLite UI first). It works well with gLite 3.1 UI, and works with the 3.2 UI, modululo known problems in 3.2. Once they are resolved, gqsub will work identically on both.

<subsection>RPM</subsection>

In addition to manually installing the RPM, there's a repo at http://www.scotgrid.ac.uk/gqsub/repo which can be tied into yum. In general, the RPM release number should be 1, except in the case of a packaging problem. The main version number will be increamented for every new release with feature changes.

The RPM installs things into /opt/lcg at the moment. It should put it in the same general location as the rest of the Grid stack; hopefully that'll work. This is what we have installed at Glasgow

<subsection>Tarball</subsection>

Unarchive the file into a directory on your path. There's a subdir containing some python modules that should be fine where it endsup.

Testing

There is a test script, called (imaginativly!) test.sh. This runs a basic job, outputs a little bit of diagnostics, ships in a C file, compiles and runs it. The C file does some output, and sleeps for a few minutes (i.e. just long enought to detect it in the monitoring system.) Production of a formal test suite is on the todo list.

Chapter 3. General configuration

gqsub uses a configuration file, and a directory tree. The configuration file is $HOME/.gqsubrc, and it defines the location of the directory tree, and a number of other variables. If the file is not present it is generated with default values. If it is present, but with values not specified, then a default for that value will be assumed. The defaults will be read from /etc/gqsubrc, or, if not specified there (or it's not present), then compiled in values. Most of the time these will not need changed, except perhaps the defaultdirective and defaultvo. An example file, set with the default values for these options is given in Example 3.1, “Sample .gqsubrc”.

Example 3.1. Sample .gqsubrc

[GQSUB] 
 Section header - required; exactly one section present at the moment. 
sharedpaths = 
 Paths that should be assumed to be the same between the submit host and the worker node. When within a sharedpath, file staging is not performed.  The values here are used, if neither --on-shared-path or --not-on-shared-path is specified.  These might typically be AFS pathnames.
maxjobspersubmission = 50
 Large array jobs will be split up into tranches no larger than this.  The 50 is the current recommendation from the WMS developers.
defaultdirective = #PBS
 Default directive.  Torque and PBS use #PBS, SGE uses #$.  It is best to set this to match the an existing local cluster, where there is one.
gqsubdirective = #GQSUB
 Additional directive, gqsub specific.  Use this to set things that should not be used for local jobs - e.g. file staging. 
proxysafetymargin = 129600
 Additional time required on the proxy - if you find your jobs regularly queue for a long time, you might want to increase this.  
gqsubdir = $HOME/.gqsub
Location of the directory tree to use for storing jobs.  Changing this will result in the disappearance of any existing jobs 
defaultvo = a.best.guess
 When no VO is specified, and there is no existing certificate, generate a proxy with this VO.  Most users have a single VO, and those that have multiple will tend to use one significantly more than the other.  This is originally filled in from an existing certificate.
defaultshell = /bin/sh
 The shell that will be used to execute the script.  This can be changed per job with -S, and this is the same default that qsub has.
gridftphost = gridftp.server.host
 The hostname for a GridFTP server that shares paths with the submission machine. This can be a separate server. The command line option --no-gridftp-host disables this, otherwise it will be used in preference to dlayed output collection.
credentials = PlainVoms
 How to handle proxy certificates.  'PlainVoms' means to use simple VOMS proxies; the other option is 'MyProxy', which uses an ordinary VOMS proxy for short use (e.g. job querying), and the MyProxy server for longer uses (e.g. job submission).
The most critical of these is gqsubdir - the other can all be changed at runtime. The exact storage needs for this will vary, but as a data point, 320 jobs at maximal space usage takes 13 MB. Much of the space is taken with the storage of the final job status by gqstat. In $gqsubdir a directory will be created per job submitted - these will be numbered sequentially. If a file called default.jdl is in this directory, it will be included as a default set for glite-wms-job-submit - this is a place where arbitrary JDL instructions can be included. In the event that it is desired to have different options set for local and Grid jobs, the recommended approach is to set the option via the normal directive for local use, and then use the gqsub directive to change it. The file is parsed in order, top to bottom, so entries later in the file take precedence.

Chapter 4. gqsub

The submission engine

Example

Lets start with an example submission script, and show it in action. Firstly, here's the script in Example 4.1, “Trivial submission script”.

Example 4.1. Trivial submission script

#!/bin/sh

#PBS -l cput=0:30:00
#PBS -l walltime=0:30:00
#GQSUB -q UKI-SCOTGRID-GLASGOW

echo in user script
hostname
pwd

And here's the submission and monitoring in use.

Example 4.2. Command line use of gqsub

-bash-3.00$ ls
gqdel  gqoutput  gqstat  gqsub  gqsubconfig.py  gqsubconfig.pyc  gqsubproxy.py  gqsubproxy.pyc  test.sh
-bash-3.00$ gqsub test.sh 
File stagein requested: on a shared path, so no staging needed.  Note that this captures files at time of execution.

Couldn't find a valid proxy.

Proxy renewal required - looking for 36:45:00
Enter GRID pass phrase:
Submitting job 320 as: https://svr023.gla.scotgrid.ac.uk:9000/SJ7Ch-mRrEn4dD9oAgHecw

-bash-3.00$ gqstat 320
Getting status for 320
Job id Name User                                                    WallTime State     Queue                                               
------ ---- ------------------------------------------------------- -------- --------- ----------------------------------------------------
320         /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=stuart_purdie          Scheduled dev010.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d
-bash-3.00$ ./gqstat 320
Getting status for 320
Job id Name User                                                    WallTime State   Queue
------ ---- ------------------------------------------------------- -------- ------- ----------------------------------------------------
320         /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=stuart_purdie 00:02:14 Running dev010.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d
-bash-3.00$ ./gqstat 320
Getting status for 320
Job id Name User                                                    WallTime State Queue                                               
------ ---- ------------------------------------------------------- -------- ----- ----------------------------------------------------
320         /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=stuart_purdie 00:26:32 Done  dev010.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d
-bash-3.00$ ls
gqdel  gqoutput  gqstat  gqsub  gqsubconfig.py  gqsubconfig.pyc  gqsubproxy.py  gqsubproxy.pyc  test.sh  test.sh.e320  test.sh.o320
-bash-3.00$ more test.sh.o320 
in user script
node296.beowulf.cluster
/cluster/share/gla058/gqsub
-bash-3.00$ 

The details of the output from gqstat will be covered in detail later; for the moment note that it shows the job going through the relevent stages. In this case, the job was run on a shared path, so the output appeared automatically as it was generated. The job script executed, test.sh, is the one shown in Example 4.1, “Trivial submission script”.

The prompt to generate the proxy was due to the current proxy having expired. Note that there was no VO specified explicitly anywhere - it was assumed that the VO to use matched the last VO used. If there had been no proxy at all ( e.g. voms-proxy-destroy had been run, then it would have drawn the VO name from the default VO in .gqsubrc.

Command line options

POSIX defines many command line options, and many more are present in vendor extensions. Here we have a complete survey of the command line options available to gqsub.

Figure 4.1. gqsub command line options

Long options are all gqsub extensions over POSIX
Short optionLong optionNameOriginStatusNotes
-a Not before datePOSIXNot supportedNot intended to support; cannot implement with gLite functionality, and of limited use. The best solution would be to use local 'at' command to achieve it, and that would run the risk of having jobs not be submitted when they might be expected to have been.
-A--voAccount Name POSIXSupported, mapped to VOIt is assumed that the Account that POSIX uses is mapped directly on the the VO.
-c CheckpointingPOSIXNot supportedNot intended to support; cannot implement with gLite functionality, and would require significant investment to produce the net effect.
-C Directive Prefix POSIXSupported
-e Stderror filename POSIXSupported
-j Output stream joiningPOSIXNot supportedIntended to support; it's been lower down the list.
-I Interactive jobTorque extensionNot supportedNo plans as yet; batch jobs considered much more important
-k Files to keep POSIXNot supportedNot intended to support; explicitly not possible with lcg-pbs job managers, and of limited utility even if it were.
-m When to email status updatesPOSIXNot supportedWould like to support, but appears to be rather tricky. Emailing out from the job itself appears to be not reliable, so will need some stable service somewhere to do the emailing. That is an external dependence that I want to avoid for the moment. The optimal solution would be to use the email mechanism in the underlying local cluster, which will require CE support.
-M Addresses to emailPOSIXNot supportedWill not support; exploitable in this context. In practice, the person to email should be taken from the X509 certificate. See above for other email issues
-N Job NamePOSIXSupportedDefaults to the name of the script if not provided
-o Stdout filename POSIXSupportedDefaults to ${JobName}.o${job id} if not provided
-p PriorityPOSIXNot supportedNot intended to support. Job priority is done at the VO level, gLite has no scope for user specified job priority
-q Destination POSIXSupported, overloadedSee discussion of destinations
-r RerunPOSIXNot supportedIntended to support; can be mapped onto the gLite retry mechanism.
-S Shell interpreterPOSIXSupportedName of the program used to interpret the script with - typically /bin/sh or similar, but can also be, e.g. a Python or R script, or any other interpreted language.
-t Array jobCommon extensionSupported, see notesThis is a common extension, and many implementations have a slightly different way of handling these. There are two aspects to that, the way the jobs are specific, where gqsub takes a single range of numbers; and in how this is passed to individual jobs.
-u POSIXNot supported
-v POSIXNot supported
-V POSIXNot supported
-W POSIXPartially supported and extendedCertain attributes are suported, and passed through. Other are not supported.
-z POSIXNot supported
--on-shared-path gqsub extensionForces assumption of shared filesystem. This forces gqsub to not do any explicit file staging for data return.
--not-on-shared-path gqsub extensionProhibits assumption of shared filesystem. This forces gqsub to use explicit file stageing to return data.
--gridftp-host gqsub extensionIndicates the host name of a server running GridFTP, that can access the submisison directory. Presence of this indicates that gqsub should use auto stage back, unless on a shared path.
--no-gridftp-host gqsub extensionIndicates that gqsub should not use direct staging of output.
--dry-run gqsub extensionInstructed gqsub to do everything, except the final submission. Shows the result of a list-match (i.e. lists valid targets for the job). This is primarily intend as a debugging tool.
--verbose gqsub extension
--cerequirements gqsub extension
--jdl gqsub extension

File staging and output

In most cases, the most important part of a job is the output. This is what the aim of the exercise is, and is the whole point. A few jobs will, as a natutal consequence of the desired effect, write the output to some external system. Most of the time, however, the person who submitted the job wants to see some output back. gqsub has a few tricks available to assist with this.

Fundementally, Grid systems are desigined with the idea of disconnected operation in mind - that is, the machine where the job runs knows and cares about nothing that relates to the machine that submitted the job. This is part of what allows it to scale to such a large system, and prevents problems in one machine affecting others. However, this is not the way the people want to work. In most cases, local batch systems are used with a shared filesystem - shared between the headnode and the worker node. Although it introduces risks of race conditions, and other oddities, if used incorrectly, it is at it's core much easier to understand what is happening - everything is written to one place. Given the familiarity with shared filesystems that is prevelent, the theoretical downsides do not appear to cause problems under real use conditions.

By building the illusion of a shared filesystem, the aim is to couple the scability and distributed nature of the Grid with an easy to understand way of working.

There are three ways of working that gqsub supports, depending on what's available at the UI. Where more than one is feasable, the first in the list is picked - although it is possible to change this with specific command line options.

The directory the job was launched from is shared (with the same path) between the UI and the worker node. Other than site specific instances, this is most likely to happen with AFS storage. In this case, gqsub will ensure that the job writes everything into the directory with the same path name as the current working directory on the UI machine. This does what you might expect, and put all output directly there. This output mode has the advantage that increamental output is available for inspection. Note that this mode of operation can cause complications with resubmission of failed jobs - existing Grid mechanisms for job clean up are not built with shared filesystems in mind. This method does not require explicit file staging.
If there is a GridFTP server available on the UI machine, or on another machine that can see the same paths as the UI, then gqsub can direct the output to be written to the directory the job was launced from. This doesn't allow incremental output to be seen, but does mean that the output appears automatically on job completion. Explicit file staging is required.
If no other option is available, gqsub delegates output handling to gqstat. Once gqstat observes that the job is complete, it uses the glite tools to collect the output, and place it in the directory the job was launched from. This results in a short delay in gqstat, once it notices the job status change. However, as a general way of working, it does abstract away much of the detail of file handling, so that once the job is reported as 'Done' in gqstat, the output is present and ready to be worked with. Note that once a job is complete, it will be displayed in gqstat exactly once, and then hidden away - this prevents the list being full of old jobs.

In general, the post-hoc output collection is always available, whilst the other require additional support, beyond the basic gLite UI. In particular, as use of a GridFTP server on the UI requires it to be present when the job completes, it is not suitable to use that output mode for use in an 'occasionally connect' manner - i.e. laptops. On the other hand, it's a strong adjunt to an existing cluster, if installed on the head node.

Specifiying file staging

The standard specifies that files to be staged are given as attributes (via -W) and specified as a host and remote filename, with a local filename. This is intented to allow the job to use scp (originally, rcp or similar remote copy) for collecting the files. This is not normally needed, and therefore gqsub offers a slight extension where if a filename alone (relative to the directory in which gqsub is run) is given, then it will assume that this is a local file, and mark it for staging.

If files are specified with scp suitable notation, then the scp is done on the machine from which the job is submitted, <italic>at the time of submission</italic>. This is techically wrong, as it is supposed to be done at the time the job starts. However, practical limitations apply here - whilst we can assume that the scp will either be password-less, or the user be able to enter the password, at submission time, neither is a realistic assumption on the worker node at run time.

For cases where the data is availabe from a GridFTP server, or when the collection of data at runtime is an important consideration, gqsub also allows specification of full GridFTP urls to data files. These will be collected at runtime.

Example 4.3. An example of using explict file staging with gqsub

#!/bin/bash

#PBS -l cput=0:30:0

#GQSUB -q UKI-SCOTGRID-GLASGOW

#GQSUB -W stagein=Si_00.usp
#GQSUB -W stagein=si.cell
#GQSUB -W stagein=si.param

#GQSUB -W stageout=si.castep
#GQSUB -W stageout=si.castep_bin
#GQSUB -W stageout=si.bands
#GQSUB -W stageout=si.check
#GQSUB -W stageout=si.cst_esp
#GQSUB -W stageout=si.wvfn.1

# Specific to Glasgow, hence the -q flag
CASTEP=/expsoft/ssp/castep/4.4/castep/castep

$CASTEP si

This example is pretty typical - a small number of files are staged in, and other files are staged back.

Multi-job submission by array jobs

It's often the case that one wants to do a something very many times, with only slight differences between each run. The traditional method to specifing this is to use the 'Array job' semantics in qsub, where each subjob is given a number, and this number is used to influence the behaviour of the job.

With gqsub, the -t flag indicates the indent to use array jobs. Each job is submitted in batches, and when they are started on the worker node the environment contains a number of values. It is these values that should be used to tune the performance of the job.

Example 4.4. An example of using array jobs with gqsub

#!/bin/bash

#PBS -l cput=0:30:0

#GQSUB -W stagein=input.$PBS_ARRAY_ID
#GQSUB -W stagein=cruncher

#GQSUB -W stageout=output.$PBS_ARRAY_ID

cruncher -i input.$PBS_ARRAY_ID -o output.$PBS_ARRAY_ID

This example would be run with the -t flag, as 'gqsub -t 1-78 runCruncher'. The enrivonemnt variable $PBS_ARRAY_ID is set to the subjob number for that specific job - this is the Torque convention. Also supported in gqsub is the SGE conventions for array jobs. All of these are detailed in the table below.

Figure 4.2. Environment variables for array jobs in gqsub.

Environment variables set by gqsub when running subjobs. Note that all are always present (as they do not conflict).
Environment variableMeaningOrigin
PBS_ARRAY_IDSubjob ID for this jobTorque
PBS_ARRAYID PBS
SGE_TASK_ID SGE
SGE_TASK_FIRSTLowest numbered taskSGE
SGE_TASK_LASTHighest numbered taskSGE
SGE_STEP_SIZEStep size between each taskSGE

Chapter 5. gqstat

Job monitoring

Table of Contents

Job output

In combination with gqsub, gqstat provides a familiar display of current jobs.

Job output

Although perhaps not the most expected feature, gqstat can be involved in job output. If the job cannot return the output data automatically (either by shared filesystem, or GridFTP), then the output has to be collected after the job is finished. Indeed, this is the default more of operation for gLite middlewares.

With gqsub, the aim is to produce the illusion of a shared file system. Therefore, if the output has to be collected, once gqstat determines the job is complete, (in the 'Done' state), it will collect the output automatically. This has the effect that one gqstat reports the job is done, you can work with it's output, and don't have to be concerned with the details of collecting jobs or proxy certs.

Chapter 6. Known issues

gqsub

Couldn't find a valid proxy certificate

Example 6.1. Couldn't find a valid proxy certificate

bash$ gqsub -t 1-2 Script_Final_Model_10_run.sh

No stagein specified
No file stage out requested
Submitting job 14 as:
Error - I/O Error
Couldn't find a valid proxy certificate

This actually results from an issue with the gLite 3.1 submission tools on a 64 bit host. Sometimes (and it's not clear what) certain combinations of parameters cause glite-wms-job-submit to break. This is a known (and ignored) problem: https://savannah.cern.ch/bugs/?46145.

As there is no known cause, there's no direct solution to this. However, the bug appears to depend on the order of the parameters supplied to glite-wms-job-submit, and be determanistic upon that. Therefore there is an option: --glite-quirks that will change the order, without changeing the functionality. If you find this problem, put that flag onto gqsub, and hopefully it will resolve it.

If that doesn't work, the --verbose flag also changes the parameters to glite-wms-job-submit, and has also been noted to resolve this issue. I'm afraid the best advice I can give is to fiddle about with those until you find some combination that works. Once it works, it appears to be determanistic, so you can bake these parameters into the submission script, under a #GQSUB tag.

This problem does not occur with gLite 3.2 UI on a 64bit host


Last modified Tue  3 August 2010 . View page history
Switch to HTTPS . Print View . Built with GridSite 1.5.1