Best Practice Guide

From ScotGrid

There are many different ways of using the Grid. Ideally, all of these would work brilliantly, but the Real World decides that sometimes some ways of doing things are better than others. This page is a list of the recommended ways of doing things, and some advanced tricks that make the Grid sing and dance.

Table of contents

Logging in to the UI machine (svr020)

It can be a pain to keep track of port numbers and command line switched for logging in to the UI machine. Fortunatly, SSH has a mechanism to keep track of such things for you. If you create a file called config in the .ssh subdirectory of your home directory (where the leading . is important - this directory will nearly always exist already), you can put standard information there, rather than have to keep specifying it.

Here's the relevent part of my ~/.ssh/config

Host svr020 scotgrid
        User glaXXX
        HostName svr020.gla.scotgrid.ac.uk
        Port 2222
        GSSAPIDelegateCredentials no
        ForwardX11 yes

The Host line starts the definition, and is a space separated list of 'things that you type on the command line' - in effect, this builds an alias called 'svr020' that logs me into the UI machine, with the specified username, and uses the correct port. It also has another alias called 'scotgrid' that does the same thing.

This file is used by gsissh, ssh, gsiscp and scp. This means that after putting it in place you can forget about things like the -P flag vs -p for gsiscp, and just get on with doing stuff! The GSSAPIDelegateCredentials replaces the -k flag for gsissh, and causes no problems for conventional ssh.

I log into the machine with

   gsissh svr020

and copy files to it by

   gsiscp file.name svr020:

How many jobs should I submit?

Using the Bulk job submission system, jobs should be submitted in batches of about 50.

When a batch of jobs are submitted, it all goes to the same WMS. Whilst batch submission allows reuse of proxy information, and authentication, this piling on of jobs does load up the WMS a bit. It also prevents distribution of the jobs across multiple WMS's (although the WMS will split them over multiple CE's as normal). This last point is rather important - you have a fixed quota of space on the WMS to use for the input and output sandboxes.

To get the best performance, you will want to ensure that you do not exceed your quota on the WMS. Unfortunately, it's not obvious how much the quota is. For the WMS's at Glasgow, we have set the quota at 2GB.

If you have many jobs in flight, you should make sure that you don't go over quota. Normally this means retrieving completed jobs, so they don't sit on the WMS when space is needed.

For the case where you have many jobs with large outputs, this might be tricky, and we are investigating other possibilities.

How should I specify a queue?

The short version is you shouldn't, really. By doing so, you prevent most of the Griddy cleverness from being used, and have to keep updating things in the event of downtime. Further, specifying a single site can trigger problems with the WMS. That said, you might have a good reason to restrict somewhat (e.g. using software installed only a specific sites).

If you need to specify a specific site, the reccomendation is to use

Requirements = other.GlueSiteUniqueID == "UKI-SCOTGRID-GLASGOW"
              && other.GlueCEPolicyMaxCPUTime > 48;
Site UniqueID's
Site UniqueID
Durham UKI-SCOTGRID-DURHAM
Glasgow UKI-SCOTGRID-GLASGOW
Edinburgh UKI-SCOTGRID-ECDF

where the approriate UniqueID can be found in BDII information system, or the table for the Scotgrid sites. The CPU time is specified in minutes.

Specification in this manner allows the WMS to pick one of multiple CE's, and possible longer queues if they aught to give you your results faster. The CPU time is specified in minutes. This form of specification also means that if you try to issue a job that takes longer than any queue can accept, you will get an error at submission, rather than having to wait until the job fails to discover the problem.

How do I get my jobs to run faster?

Ah, the perennial question. With computers, faster is always better. Sometimes we can give them a little help to do that.

Optimise disk access

Unless you have a particularly unusual access pattern, it will be quicker to take a copy of your data to the local worker node, and then copy back once finished. This means that whenever your job wants the data, it gets it locally, and thus spends less time waiting on the network. (It also reduces the load on the disk server, which means everyone likes this!)

This is as simple as

cp ${CLUSTER_SHARED}/data/bigfile1 .

and then referring to the file just by its name. When the job is started, it's already been carefully put in a a good place to do such things, with plenty of space (If you expect to use much more than about 50G or so, speak to us first, and we can give more detailed advice in that case.)

Note that the opposite also applies - if you're writing files, it's going to be faster to write them on the worker node, and then copy them to the cluster storage as the last step. In fact, this can make a bigger difference - output tends to be written more gradually, and thus can't sit in the maximally efficient workload.

./myjob > output
cp output ${CLUSTER_SHARED}/data/output1

This way of working also fits with the model used by Grid enabled Storage, so if you did start working with very large quantities of data, it's easier to transition if you already work in this manner.

The once exception to this might be if you know a data file is read exactly once, from start to finish, in order. However, even then, due to the vagaries of the disk scheduling algorithm, it might still be quicker to take a local copy.

Use the optimal queue

The details of this vary by site.

At Glasgow, the 3 day and longer queues are run only on the orignal nodes. The 2 day and shorter queues are split over the all the nodes, including the shiney new Quad Xeon worker nodes. Therefore, if your job goes to one of the shorter queues, your are going to get the results faster - less time queueing. If you can split long computations up into shorter blocks, this is recommended. (In general, due to the dominance of the very large Particle Physics VO's for CERN, things will tend to get optimised for around 2 days).

If you need assistance to split things down, do drop us a line.

How heavily loaded is the grid?

The command

 lcg-infosites --vo vo.scotgrid.ac.uk ce

will list all the CE's (compute elements) that the given VO has access to, and the current status of each.

-bash-3.00$ lcg-infosites --vo vo.scotgrid.ac.uk ce
#CPU    Free    Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------
1900       1     179            128       51    svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q7d
 664      33    1535            613      922    ce02.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q3d
1501     122    1348           1347        1    ce.glite.ecdf.ed.ac.uk:2119/jobmanager-lcgsge-ecdf
1501     153    1318           1316        2    mw05.ecdf.ed.ac.uk:2119/jobmanager-lcgsge-ecdf
 664      33      24             18        6    ce01.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q6h
 664      33       2              0        2    ce01.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q30m
1900       0      49             36       13    svr026.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q1d
 664      33    1535            613      922    ce01.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q3d
 664      33       2              0        2    ce02.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q30m
 664      33      24             18        6    ce02.dur.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q6h
1900       0      81             15       66    svr026.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q2d

This is a useful diagnostic tool if you are observing unexpected behaviour. Note that you should _not_ normally use this to select a CE, and then write it into the JDL files. Instead, use the Rank stanza in a JDL to get the WMS to do this for you.

How do I use manual delegation?

Don't. Always use the -a flag, to trigger automatic delegation.

Manual delegation theoretically offers a small performance benefit. This is of the order of a few hundred milliseconds. Further, delegation is per WMS, which means to use this, you have to override the automatic round robin of WMS's, or deliberately delegate to them all with the same ID. By the time you've done this the performance benefit is eaten up, unless you're submitting a huge number of jobs. And if you are submitting a huge number of jobs, you should do it in a collection, as that has other advantages as well as reusing delegated credentials for all jobs in the collection.


How many jobs am I running?

From the command line

glite-wms-job status <wmsjobid> or gqstat (depending on how the job was submitted)

From the browser

It's conventional to advise users to keep track of the returned job ID's. However, from time to time, these get lost, or never recorded. It is possible to find out what jobs one has 'in flight', by asking the WMS. Note that this depends on the WMS, and, by default, Scotgrid is set up to submit between 2 WMS - therefore you have to check 2 places. At some point, we'll write up an aggregator to make this better.

For the moment, then, click on each of:

   https://svr022.gla.scotgrid.ac.uk:9000/
   https://svr023.gla.scotgrid.ac.uk:9000/

That's it!

(This assumes that your web browser has your user certificate installed in it. Given that every single CA that I've seen does it that way, I feel this is a safe assumption. If you want to use this from a computer where you didn't obtain the certificate, you'll need to move it over. Drop us a line if you want a hand with that.)

From our batch monitoring system

You can monitor the status of scotgrid by following this URL:

         https://svr031.gla.scotgrid.ac.uk/pbswebmon/