Table of Contents
List of Figures
List of Examples
GQSUB was born out of work with a group of users with existing HPC experience. The most common question we were asked was, "Can we not just use qsub?". (Which wasn't the case, as that would have bound them to a single site...) However, the intent was clear: Most Grid tools don't work quite like anything else, and feel a lot more complicated. Parts of this is necessary complexity - using certificates to allow for decentralised authentication and authorisation, for example. Parts of this is unnecessary complexity, which could be removed.
By focusing in on a specific user community, it is possible to tailor the user interface to that community. The more focused community, the better the tailoring can be. By focusing on existing HPC/HTC users, who typically have their own cluster, gqsub can exploit existing knowledge. This reduces learning time, and improves usability. This user group was selected as a target due to it size (big), and current uptake of Grid computing (small) - essentially, it's logical place to expect Grid use to expand into.
The aim of gqsub is to allow a user of a local cluster to install a small bit of code on their existing head node, and then be able to submit the same job control scripts to either the local cluster or the grid, without modification.
In practice, the aim of running without modification is probably unrealistic. It can happen if there is a shared filesystem between the head node and the Grid system, but that's not a common method of operation (yet). Therefore there is an ability to make small changes dependant on if it's running locally or on the Grid. In trials, a couple of lines were all that needed to be adjusted - most of the configuration remains the same, as it is the same job; just a few small changes on
There are two supported methods of installation; via RPM or tarball
Either method requires a gLite UI to be installed, in order to work. (Strictly, in either order; in practice, do the gLite UI first). It works well with gLite 3.1 UI, and works with the 3.2 UI, modululo known problems in 3.2. Once they are resolved, gqsub will work identically on both.
<subsection>RPM</subsection>In addition to manually installing the RPM, there's a repo at http://www.scotgrid.ac.uk/gqsub/repo which can be tied into yum. In general, the RPM release number should be 1, except in the case of a packaging problem. The main version number will be increamented for every new release with feature changes.
The RPM installs things into /opt/lcg at the moment. It should put it in the same general location as the rest of the Grid stack; hopefully that'll work. This is what we have installed at Glasgow
<subsection>Tarball</subsection>Unarchive the file into a directory on your path. There's a subdir containing some python modules that should be fine where it endsup.
There is a test script, called (imaginativly!) test.sh. This runs a basic job, outputs a little bit of diagnostics, ships in a C file, compiles and runs it. The C file does some output, and sleeps for a few minutes (i.e. just long enought to detect it in the monitoring system.) Production of a formal test suite is on the todo list.
gqsub uses a configuration file, and a directory tree. The configuration file is $HOME/.gqsubrc, and it defines the location of the directory tree, and a number of other variables. If the file is not present it is generated with default values. If it is present, but with values not specified, then a default for that value will be assumed. The defaults will be read from /etc/gqsubrc, or, if not specified there (or it's not present), then compiled in values. Most of the time these will not need changed, except perhaps the defaultdirective and defaultvo. An example file, set with the default values for these options is given in Example 3.1, “Sample .gqsubrc”.
Example 3.1. Sample .gqsubrc
[GQSUB] Section header - required; exactly one section present at the moment. sharedpaths = Paths that should be assumed to be the same between the submit host and the worker node. When within a sharedpath, file staging is not performed. The values here are used, if neither --on-shared-path or --not-on-shared-path is specified. These might typically be AFS pathnames. maxjobspersubmission = 50 Large array jobs will be split up into tranches no larger than this. The 50 is the current recommendation from the WMS developers. defaultdirective = #PBS Default directive. Torque and PBS use #PBS, SGE uses #$. It is best to set this to match the an existing local cluster, where there is one. gqsubdirective = #GQSUB Additional directive, gqsub specific. Use this to set things that should not be used for local jobs - e.g. file staging. proxysafetymargin = 129600 Additional time required on the proxy - if you find your jobs regularly queue for a long time, you might want to increase this. gqsubdir = $HOME/.gqsub Location of the directory tree to use for storing jobs. Changing this will result in the disappearance of any existing jobs defaultvo = a.best.guess When no VO is specified, and there is no existing certificate, generate a proxy with this VO. Most users have a single VO, and those that have multiple will tend to use one significantly more than the other. This is originally filled in from an existing certificate. defaultshell = /bin/sh The shell that will be used to execute the script. This can be changed per job with -S, and this is the same default that qsub has. gridftphost = gridftp.server.host The hostname for a GridFTP server that shares paths with the submission machine. This can be a separate server. The command line option --no-gridftp-host disables this, otherwise it will be used in preference to dlayed output collection. credentials = PlainVoms How to handle proxy certificates. 'PlainVoms' means to use simple VOMS proxies; the other option is 'MyProxy', which uses an ordinary VOMS proxy for short use (e.g. job querying), and the MyProxy server for longer uses (e.g. job submission).
Table of Contents
Lets start with an example submission script, and show it in action. Firstly, here's the script in Example 4.1, “Trivial submission script”.
Example 4.1. Trivial submission script
#!/bin/sh #PBS -l cput=0:30:00 #PBS -l walltime=0:30:00 #GQSUB -q UKI-SCOTGRID-GLASGOW echo in user script hostname pwd
And here's the submission and monitoring in use.
Example 4.2. Command line use of gqsub
-bash-3.00$ ls gqdel gqoutput gqstat gqsub gqsubconfig.py gqsubconfig.pyc gqsubproxy.py gqsubproxy.pyc test.sh -bash-3.00$ gqsub test.sh File stagein requested: on a shared path, so no staging needed. Note that this captures files at time of execution. Couldn't find a valid proxy. Proxy renewal required - looking for 36:45:00 Enter GRID pass phrase: Submitting job 320 as: https://svr023.gla.scotgrid.ac.uk:9000/SJ7Ch-mRrEn4dD9oAgHecw -bash-3.00$ gqstat 320 Getting status for 320 Job id Name User WallTime State Queue ------ ---- ------------------------------------------------------- -------- --------- ---------------------------------------------------- 320 /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=stuart_purdie Scheduled dev010.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d -bash-3.00$ ./gqstat 320 Getting status for 320 Job id Name User WallTime State Queue ------ ---- ------------------------------------------------------- -------- ------- ---------------------------------------------------- 320 /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=stuart_purdie 00:02:14 Running dev010.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d -bash-3.00$ ./gqstat 320 Getting status for 320 Job id Name User WallTime State Queue ------ ---- ------------------------------------------------------- -------- ----- ---------------------------------------------------- 320 /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=stuart_purdie 00:26:32 Done dev010.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d -bash-3.00$ ls gqdel gqoutput gqstat gqsub gqsubconfig.py gqsubconfig.pyc gqsubproxy.py gqsubproxy.pyc test.sh test.sh.e320 test.sh.o320 -bash-3.00$ more test.sh.o320 in user script node296.beowulf.cluster /cluster/share/gla058/gqsub -bash-3.00$
The details of the output from gqstat will be covered in detail later; for the moment note that it shows the job going through the relevent stages. In this case, the job was run on a shared path, so the output appeared automatically as it was generated. The job script executed, test.sh, is the one shown in Example 4.1, “Trivial submission script”.
The prompt to generate the proxy was due to the current proxy having expired. Note that there was no VO specified explicitly anywhere - it was assumed that the VO to use matched the last VO used. If there had been no proxy at all ( e.g. voms-proxy-destroy had been run, then it would have drawn the VO name from the default VO in .gqsubrc.
POSIX defines many command line options, and many more are present in vendor extensions. Here we have a complete survey of the command line options available to gqsub.
Figure 4.1. gqsub command line options
| Short option | Long option | Name | Origin | Status | Notes |
|---|---|---|---|---|---|
| -a | Not before date | POSIX | Not supported | Not intended to support; cannot implement with gLite functionality, and of limited use. The best solution would be to use local 'at' command to achieve it, and that would run the risk of having jobs not be submitted when they might be expected to have been. | |
| -A | --vo | Account Name | POSIX | Supported, mapped to VO | It is assumed that the Account that POSIX uses is mapped directly on the the VO. |
| -c | Checkpointing | POSIX | Not supported | Not intended to support; cannot implement with gLite functionality, and would require significant investment to produce the net effect. | |
| -C | Directive Prefix | POSIX | Supported | ||
| -e | Stderror filename | POSIX | Supported | ||
| -j | Output stream joining | POSIX | Not supported | Intended to support; it's been lower down the list. | |
| -I | Interactive job | Torque extension | Not supported | No plans as yet; batch jobs considered much more important | |
| -k | Files to keep | POSIX | Not supported | Not intended to support; explicitly not possible with lcg-pbs job managers, and of limited utility even if it were. | |
| -m | When to email status updates | POSIX | Not supported | Would like to support, but appears to be rather tricky. Emailing out from the job itself appears to be not reliable, so will need some stable service somewhere to do the emailing. That is an external dependence that I want to avoid for the moment. The optimal solution would be to use the email mechanism in the underlying local cluster, which will require CE support. | |
| -M | Addresses to email | POSIX | Not supported | Will not support; exploitable in this context. In practice, the person to email should be taken from the X509 certificate. See above for other email issues | |
| -N | Job Name | POSIX | Supported | Defaults to the name of the script if not provided | |
| -o | Stdout filename | POSIX | Supported | Defaults to ${JobName}.o${job id} if not provided | |
| -p | Priority | POSIX | Not supported | Not intended to support. Job priority is done at the VO level, gLite has no scope for user specified job priority | |
| -q | Destination | POSIX | Supported, overloaded | See discussion of destinations | |
| -r | Rerun | POSIX | Not supported | Intended to support; can be mapped onto the gLite retry mechanism. | |
| -S | Shell interpreter | POSIX | Supported | Name of the program used to interpret the script with - typically /bin/sh or similar, but can also be, e.g. a Python or R script, or any other interpreted language. | |
| -t | Array job | Common extension | Supported, see notes | This is a common extension, and many implementations have a slightly different way of handling these. There are two aspects to that, the way the jobs are specific, where gqsub takes a single range of numbers; and in how this is passed to individual jobs. | |
| -u | POSIX | Not supported | |||
| -v | POSIX | Not supported | |||
| -V | POSIX | Not supported | |||
| -W | POSIX | Partially supported and extended | Certain attributes are suported, and passed through. Other are not supported. | ||
| -z | POSIX | Not supported | |||
| --on-shared-path | gqsub extension | Forces assumption of shared filesystem. This forces gqsub to not do any explicit file staging for data return. | |||
| --not-on-shared-path | gqsub extension | Prohibits assumption of shared filesystem. This forces gqsub to use explicit file stageing to return data. | |||
| --gridftp-host | gqsub extension | Indicates the host name of a server running GridFTP, that can access the submisison directory. Presence of this indicates that gqsub should use auto stage back, unless on a shared path. | |||
| --no-gridftp-host | gqsub extension | Indicates that gqsub should not use direct staging of output. | |||
| --dry-run | gqsub extension | Instructed gqsub to do everything, except the final submission. Shows the result of a list-match (i.e. lists valid targets for the job). This is primarily intend as a debugging tool. | |||
| --verbose | gqsub extension | ||||
| --cerequirements | gqsub extension | ||||
| --jdl | gqsub extension |
In most cases, the most important part of a job is the output. This is what the aim of the exercise is, and is the whole point. A few jobs will, as a natutal consequence of the desired effect, write the output to some external system. Most of the time, however, the person who submitted the job wants to see some output back. gqsub has a few tricks available to assist with this.
Fundementally, Grid systems are desigined with the idea of disconnected operation in mind - that is, the machine where the job runs knows and cares about nothing that relates to the machine that submitted the job. This is part of what allows it to scale to such a large system, and prevents problems in one machine affecting others. However, this is not the way the people want to work. In most cases, local batch systems are used with a shared filesystem - shared between the headnode and the worker node. Although it introduces risks of race conditions, and other oddities, if used incorrectly, it is at it's core much easier to understand what is happening - everything is written to one place. Given the familiarity with shared filesystems that is prevelent, the theoretical downsides do not appear to cause problems under real use conditions.
By building the illusion of a shared filesystem, the aim is to couple the scability and distributed nature of the Grid with an easy to understand way of working.
There are three ways of working that gqsub supports, depending on what's available at the UI. Where more than one is feasable, the first in the list is picked - although it is possible to change this with specific command line options.
In general, the post-hoc output collection is always available, whilst the other require additional support, beyond the basic gLite UI. In particular, as use of a GridFTP server on the UI requires it to be present when the job completes, it is not suitable to use that output mode for use in an 'occasionally connect' manner - i.e. laptops. On the other hand, it's a strong adjunt to an existing cluster, if installed on the head node.
The standard specifies that files to be staged are given as attributes (via -W) and specified as a host and remote filename, with a local filename. This is intented to allow the job to use scp (originally, rcp or similar remote copy) for collecting the files. This is not normally needed, and therefore gqsub offers a slight extension where if a filename alone (relative to the directory in which gqsub is run) is given, then it will assume that this is a local file, and mark it for staging.
If files are specified with scp suitable notation, then the scp is done on the machine from which the job is submitted, <italic>at the time of submission</italic>. This is techically wrong, as it is supposed to be done at the time the job starts. However, practical limitations apply here - whilst we can assume that the scp will either be password-less, or the user be able to enter the password, at submission time, neither is a realistic assumption on the worker node at run time.
For cases where the data is availabe from a GridFTP server, or when the collection of data at runtime is an important consideration, gqsub also allows specification of full GridFTP urls to data files. These will be collected at runtime.
Example 4.3. An example of using explict file staging with gqsub
#!/bin/bash #PBS -l cput=0:30:0 #GQSUB -q UKI-SCOTGRID-GLASGOW #GQSUB -W stagein=Si_00.usp #GQSUB -W stagein=si.cell #GQSUB -W stagein=si.param #GQSUB -W stageout=si.castep #GQSUB -W stageout=si.castep_bin #GQSUB -W stageout=si.bands #GQSUB -W stageout=si.check #GQSUB -W stageout=si.cst_esp #GQSUB -W stageout=si.wvfn.1 # Specific to Glasgow, hence the -q flag CASTEP=/expsoft/ssp/castep/4.4/castep/castep $CASTEP si
This example is pretty typical - a small number of files are staged in, and other files are staged back.
It's often the case that one wants to do a something very many times, with only slight differences between each run. The traditional method to specifing this is to use the 'Array job' semantics in qsub, where each subjob is given a number, and this number is used to influence the behaviour of the job.
With gqsub, the -t flag indicates the indent to use array jobs. Each job is submitted in batches, and when they are started on the worker node the environment contains a number of values. It is these values that should be used to tune the performance of the job.
Example 4.4. An example of using array jobs with gqsub
#!/bin/bash #PBS -l cput=0:30:0 #GQSUB -W stagein=input.$PBS_ARRAY_ID #GQSUB -W stagein=cruncher #GQSUB -W stageout=output.$PBS_ARRAY_ID cruncher -i input.$PBS_ARRAY_ID -o output.$PBS_ARRAY_ID
This example would be run with the -t flag, as 'gqsub -t 1-78 runCruncher'. The enrivonemnt variable $PBS_ARRAY_ID is set to the subjob number for that specific job - this is the Torque convention. Also supported in gqsub is the SGE conventions for array jobs. All of these are detailed in the table below.
Figure 4.2. Environment variables for array jobs in gqsub.
| Environment variable | Meaning | Origin | ||
|---|---|---|---|---|
| PBS_ARRAY_ID | Subjob ID for this job | Torque | ||
| PBS_ARRAYID | PBS | |||
| SGE_TASK_ID | SGE | |||
| SGE_TASK_FIRST | Lowest numbered task | SGE | ||
| SGE_TASK_LAST | Highest numbered task | SGE | ||
| SGE_STEP_SIZE | Step size between each task | SGE | ||
Table of Contents
In combination with gqsub, gqstat provides a familiar display of current jobs.
Although perhaps not the most expected feature, gqstat can be involved in job output. If the job cannot return the output data automatically (either by shared filesystem, or GridFTP), then the output has to be collected after the job is finished. Indeed, this is the default more of operation for gLite middlewares.
With gqsub, the aim is to produce the illusion of a shared file system. Therefore, if the output has to be collected, once gqstat determines the job is complete, (in the 'Done' state), it will collect the output automatically. This has the effect that one gqstat reports the job is done, you can work with it's output, and don't have to be concerned with the details of collecting jobs or proxy certs.
Table of Contents
Example 6.1. Couldn't find a valid proxy certificate
bash$ gqsub -t 1-2 Script_Final_Model_10_run.sh No stagein specified No file stage out requested Submitting job 14 as: Error - I/O Error Couldn't find a valid proxy certificate
This actually results from an issue with the gLite 3.1 submission tools on a 64 bit host. Sometimes (and it's not clear what) certain combinations of parameters cause glite-wms-job-submit to break. This is a known (and ignored) problem: https://savannah.cern.ch/bugs/?46145.
As there is no known cause, there's no direct solution to this. However, the bug appears to depend on the order of the parameters supplied to glite-wms-job-submit, and be determanistic upon that. Therefore there is an option: --glite-quirks that will change the order, without changeing the functionality. If you find this problem, put that flag onto gqsub, and hopefully it will resolve it.
If that doesn't work, the --verbose flag also changes the parameters to glite-wms-job-submit, and has also been noted to resolve this issue. I'm afraid the best advice I can give is to fiddle about with those until you find some combination that works. Once it works, it appears to be determanistic, so you can bake these parameters into the submission script, under a #GQSUB tag.
This problem does not occur with gLite 3.2 UI on a 64bit host