Ganga Quickstart Guide
From ScotGrid
Ganga is an interactive shell that is designed for the submission and management of jobs. It has a number of attractive features including:
- Monitoring and auto-retrieval of jobs
- Template jobs, automating many similar tasks
- Splitting of large jobs into smaller jobs automatically (and then merging the output back again)
- Group operations across a number of jobs
- Scriptability
| Table of contents |
Ganga Links
- Ganga Homepage (http://ganga.web.cern.ch/ganga/)
- Ganga Userguides (http://ganga.web.cern.ch/ganga/user/index.html)
Also useful are introductions to python, e.g., the python tutorial (http://docs.python.org/tut/tut.html). The iPython Documentation (http://ipython.scipy.org/moin/Documentation) points out where iPython syntax differs from normal python scripts.
HOWTO for Glasgow
Getting Started
You should have already arranged access to svr020, and have at least a vague understanding of JDL files, and how submitting jobs works. If you've not yet, you might find it useful to work through Glasgow_Job_Submission_Quickstart_Guide first.
- Login to svr020 using gsissh or ssh
- Type ganga --generate-config
- Say yes to setup the standard config files
The -g flag is a synonym for --generate-config.
Start IPython by typing ganga.
Your First Ganga Job
In [1]:j1 = Job(application=Executable(exe='/bin/echo',args=['Hello, World'])) In [2]:j1.submit() Ganga.GPIDev.Lib.Job : INFO submitting job 0 Ganga.GPIDev.Adapters : INFO submitting job 0 to Local backend Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "submitted" Out[2]: 1 In [3]: Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "running" Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "completed" In [4]:print file(j1.outputdir+'stdout').read() Hello, World
Note that this job ran locally on the UI machine, which is not too interesting.
Your First Ganga Grid Job
Prequel
You should quit ganga and edit the VirtualOrganisation stanza in .gangarc to reflect your own VO, e.g.,
VirtualOrganisation = vo.scotgrid.ac.uk
You should also ensure that ganga maintains the validity of your grid proxy, so in the [GridProxy_Properties] section uncomment the lines validityAtCreation and minValidity, putting, e.g.,
validityAtCreation = 48:00 minValidity = 24:00
And uncomment and edit the voms section, inserting the name of your VO, e.g.,
voms = vo.scotgrid.ac.uk
Once this is done, Ganga will ensure that your proxy cert is valid before use - this might result in you being prompted for your certificate passphrase when you start Ganga.
(See also the later section on certificates.)
LCG Backend
Running jobs on the grid is easy - just change the job's backend to LCG:
In [2]:gridJob=Job(backend=LCG(), application=Executable(exe='/bin/echo',args=['Hello, World']))
EDG-WMS and gLite-WMS
Running the GANGA LCG backend handler requires a LCG-UI environment. GANGA allows user to specify alternative LCG-UI environments by specifying the configuration parameters, EDG_SETUP and GLITE_SETUP corresponding to the UI setup for accessing the EDG-WMS and the gLite-WMS, respectively.
By default, the module of the LCG backend handler is automatically loaded with only the support for the EDG-WMS. The module first checks the availability and the validity of the Grid credential. Credential renewal will be triggered if the module cannot find a valid one to access LCG.
To enable the support for the gLite-WMS, one can either set the parameter GLITE_ENABLE to True within the LCG section of the .gangarc configuration file before starting GANGA or directly enable it inside a running GANGA session by:
In [1]:config['LCG']['GLITE_ENABLE'] = True
And one can switch GANGA to submit jobs through the gLite-WMS by simply changing the attribute, backend.middleware. For example,
In [2]:j=Job(backend=LCG()) In [3]:j.backend.middleware = 'GLITE'
Targeting Glasgow
The above job can run anywhere that your VO is supported. However, if you are preparing an environment to specifically target Glasgow, then you need to tell ganga not to send the job anywhere else. Do this by adding the CE's queue name to the job:
In [5]:gridJob.backend.CE='svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q1d'
or
In [5]:gridJob.backend.CE='svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-atlas'
if you're a member of the ATLAS VO.
(Change the queue to the appropriate one for the job - you can check the queues (http://goc.grid.sinica.edu.tw/gstat/UKI-SCOTGRID-GLASGOW/) in the information system monitor.) More information on the current CE's and queue setup at Glasgow can be found here.
Alternatively, you can instruct the WMS to only pick a queue based at Glasgow by
In [5]:gridJob.backend.requirements.other = ['other.GlueSiteUniqueID == "UKI-SCOTGRID-GLASGOW"']
which will allow Ganga (or yourself) to make separate decisions about the length of the queue needed.
Submitting The Job
Now run the job:
In [6]: gridJob.submit() Ganga.GPIDev.Lib.Job : INFO submitting job 2 Ganga.GPIDev.Adapters : INFO submitting job 2 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "submitted"
Ganga will submit the job to our resource broker and then automatically poll it for status changes. When the job is complete, the output is retrieved and stored in the Ganga work directory.
Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "running" Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "completing" Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "completed"
This is much more convenient than having to poll edg-job-status by hand.
Job Output
Each job has an outputdir, and all output from the job will be stored there. You can process this from within Ganga, using standard python, or (more likely), process the output offline with other tools.
By default, ganga will store jobs' outputs in ~/gangadir/workspace/Local/JOB_ID/output, where JOB_ID is a sequential job number.
Wrapper Scripts and Sandboxes
Wrapper to Start a Prepared Binary
When the job wakes up in the batch system it's probably not in the working directory you expect - it will usually be in a scratch directory for that job.
If you have prepared binaries in your $CLUSTER_SHARED area, and perhaps some input files and output directories, you might want to use a wrapper script that navigates to the correct directory, then starts up the correct code.
Here's an example, which uses some environment variables to make sure the job is running in a unique directory:
#! /bin/bash
#
# Make a structured directory to run the job in - the job's output files should go somewhere sensible
BASE_DIR=$CLUSTER_SHARED/sieve/run
cd $BASE_DIR || exit 1
JOB_DIR="$(date +'%Y-%m-%d')/$PBS_JOBID"
mkdir -p $JOB_DIR || exit 1
cd $JOB_DIR || exit 1
# Now invoke the program
BINARY=$CLUSTER_SHARED/sieve/sieve
echo "Invoking $BINARY $@"
$BINARY "$@"
if [ $? == "0" ]; then
echo "All done. Make tea..."
else
echo "$BINARY failed with status $?. Oh dear..."
fi
If this wrapper is in, say, $CLUSTER_SHARED/bin/sievewrapper.sh then the ganga job can be defined as:
In [53]: import os In [54]:sieveJob=Job(backend=LCG(CE='svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q1d'), ....: application=Executable(exe=os.environ['CLUSTER_SHARED']+'/bin/sievewrapper.sh',args=['-s', '1000', '-e', '1000000000']))
A Little Python Aside
iPython (http://ipython.scipy.org/moin/) is a fully functioning python shell, so it takes all the normal python commands. In the last example we imported the os module, which allows us to access the environment variables, such as CLUSTER_SHARED within python using os.environ.
Sandboxes
If you are just running on the Glasgow cluster then you probably don't need sandboxes (sets of files copied to/from the batch system with the job) - just work in the CLUSTER_SHARED directory. However, they can be useful, so here's how to use them:
Input Sandboxes
When a job's defined as
Executable(exe='/bin/echo', ...)
then it's the binary on the remote system which is executed. If you want to send a wrapper script with the job, then tell Ganga that the "exe" is a File:
Executable(exe=File('~/wrappers/sievewrap.sh'), ...)
Then the sievewrap.sh script is parceled up with the job and sent along with it. (In standard EGEE speak the file becomes part of the job's input sandbox.)
You can add other files to the job's sandbox using, e.g.,
gridJob.inputsandbox=[File('~/inputs/myJobInputs.dat')]
Again, these files will be in the job's working directory when the job starts.
Output Sandboxes
Output sandboxes are files which will be retrieved from the batch system once a job has run. They will be passed back to you as files in the output directory of that job.
gridJob.outputsandbox=['someOutput.txt', 'jobLogs.*']
Many jobs, little fuss
One (very) common usage pattern is that you want to run the same code several times with slightly different arguments. Ganga has a feature called 'splitters' that make this easy. The most common Splitter implementations are Generic and Arg. GenericSplitter allows you to split over any of the given job object arguments. The ArgSplitter is a commonly used sub class of the Generic Splitter that only varies the arguments to the job. The other commonly used ExeSplitter has now been deprecated as of Ganga 5. The introduction of the more powerful GenericSplitter has essentially made it redundant.
To get the functionality of the ExeSplitter for example, do the following:
exe1=Executable(exe='/bin/echo',args=['Hello, World']) exe2=Executable(exe='/usr/bin/whoami',args=[]) exe3=Executable(exe='/bin/echo',args=['Goodbye World']) j = Job() j.application.args = [] j.splitter = GenericSplitter() j.splitter.attribute = 'application' j.splitter.values = [exe1, exe2, exe3] j.submit()
ArgSplitter Example:
In [2]:argset = list(); In [4]:for n in range(2): ...: argset.append(["Hello %d" % n]) ...: In [9]:j = Job(splitter=ArgSplitter(args=argset)) Other configuration of the Job here, including the executable, and the sandboxes In [14]:j.submit() Ganga.GPIDev.Lib.Job : INFO submitting job 7 Ganga.GPIDev.Adapters : INFO submitting job 7.0 to Local backend Ganga.GPIDev.Lib.Job : INFO job 7.0 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 7.1 to Local backend Ganga.GPIDev.Lib.Job : INFO job 7.1 status changed to "submitted" ...
This the runs one job for each set of arguments in argset. Although in this case, a single argument was given, you can also
In [2]:argset = list(); In [4]:for n in range(2): ...: argset.append(["--verbose", "--with-data", "datafile", "--start", "%d" % n, "--length", "200"]) ...:
or similar, to produce a set of arguments. The %d is the way to turn the number into a string, and it does it in the obvious manner. One feature which may be useful is zero padding, where %03d will ensure that there are at least 3 figures, by adding 0 to the start for padding. This is particularly helpful for filenames, and it maintains an isomorphism between numerical and ascibetical collation.
Further, sometimes you only want a single, aggregated file as the result.
In [10]:j.merger=TextMerger() In [11]:j.merger.files=['stdout'] In [12]:j.merger.ignorefailed = True In [14]:j.submit()
The TextMerger will put the results all together at the end, giving you a single file of output to look at. For more information on splitter and mergers see Ganga helppages (http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/node11.html)
More on Splitting
In [87]:import os
In [88]:jobArray = list()
In [89]:for n in range(20):
....: jobArray.append(Executable(exe=os.environ['CLUSTER_SHARED']+'/bin/myapp', args=["--verbose", "--logFile=runZ%03d.log" % n]))
....:
In [90]:jobArray[1]
Out[90]: Executable (
exe = '/cluster/share/gla012/bin/myapp' ,
env = {} ,
args = ['--verbose', '--logFile=runZ001.log']
)
Note we:
- Need to import the os module to get access to the environment list.
- Use the string % operator to zero pad the log file name. (See the python manual (http://docs.python.org/lib/typesseq-strings.html).)
Now we use an GenericSplitter to define our multi-part job (an ArgSplitter could have been used for this example as only the args are changing):
In [92]:bulkGridJob=Job(splitter=GenericSplitter(apps=jobArray), \
backend=LCG(CE='svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q1d'))
In [93]:bulkGridJob.submit()
Ganga.GPIDev.Lib.Job : INFO submitting job 19
Ganga.GPIDev.Adapters : INFO submitting job 19.0 to LCG backend
Ganga.GPIDev.Lib.Job : INFO job 19.0 status changed to "submitted"
Ganga.GPIDev.Adapters : INFO submitting job 19.1 to LCG backend
Ganga.GPIDev.Lib.Job : INFO job 19.1 status changed to "submitted"
Ganga.GPIDev.Adapters : INFO submitting job 19.2 to LCG backend
...
The submission of each sub-job is done separately, which can take a little time. As usual ganga will take care of polling the status of each job and retrieving the output when it becomes available. In this way ganga can control the submission of several hundred jobs quite easily. Note that the output of each subjob will be found in a numbered subdirectory of the main controlling job (in this case, job 19):
In [97]:bulkGridJob.subjobs[1].outputdir Out[97]: /clusterhome/home/gla012/gangadir/workspace/Local/19/1/output/
And all the other job parameters can be queried in the same way:
In [99]:bulkGridJob.subjobs[1].backend
Out[99]: LCG (
status = 'Scheduled' ,
reason = 'Job successfully submitted to Globus' ,
iocache = '' ,
CE = 'svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q1d' ,
middleware = 'GLITE' ,
actualCE = 'svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-q1d' ,
id = 'https://svr023.gla.scotgrid.ac.uk:9000/S_RBomCRMwFN0kG_rUp7Gg' ,
jobtype = 'Normal' ,
exitcode = None ,
requirements = LCGRequirements (
other = [] ,
nodenumber = 1 ,
memory = None ,
software = [] ,
ipconnectivity = 0 ,
cputime = None ,
walltime = None
)
)
If the standard splitters are not enough or indeed you would like to write your own ganga plugins. Then you can write your own. It is python after all. All you need to do is get a copy of ganga and extend it for what you need. It is a dynamic language at the end of the day.
Grid Certificates in Ganga
Controlling Certificate Lifetime
Grid jobs need to have a valid proxy certificate for the entire lifetime of the job - and this has to include any queuing time. You can ensure that ganga will submit certificates with suitable lifetimes by changing the parameters in .gangarc. For example, if your job takes 24 hours to run, and you want to allow for 24 hours of time in the queue, then the following settings could be employed:
[GridProxy_Properties] # Proxy validity at creation (hh:mm) validityAtCreation = 48:00 # Minimum proxy validity (hh:mm), below which new proxy needs to be created minValidity = 24:00
Ganga will now refuse to submit the jobs unless a proxy of 48 hours exists. Use
gridProxy.renew()
to renew your proxy. gridProxy.info() will tell you how much time is left.
Using a MyProxy Server
The above method is quite risky, in that it exposes long lived proxies on sites. In addition the VOMS server for your VO will probably refuse to issue a proxy longer than 4 days (which makes using a 7d queue impossible without proxy renewal). It's much better than this is to upload a long lived proxy to a MyProxy server. Then the Glasgow resource broker will renew proxies for jobs which are running short. The default MyProxy server on svr020 is hosted at RAL Tier1 and the command to upload a proxy certificate to here is:
$ myproxy-init -d -n
The default lifetime of the proxy is 7 days. This can be increased, but it's better to just renew it as necessary. The -d flag tells MyProxy to use your Distinguished Name (rather than Unix user name) to identify you, and the -n flag results in the credential being stored without a passphrase (otherwise, the WMS will not be able to automatically renew your credentials).
To get information about your uploaded proxy use myproxy-info -d or to delete an uploaded proxy use myproxy-destroy -d
If you want to submit Ganga jobs via the Glasgow gLite WMS and have them make use of the RAL MyProxy service, you need to follow these steps:
- Generate a proxy for your VO: voms-proxy-init -voms my.specific.vo
- Upload a proxy certificate to the RAL MyProxy server: myproxy-init -d -n
- Delegate your proxy to the gLite WMS: glite-wms-job-delegate-proxy -a --vo my.specific.vo
- Ensure Ganga is using the RAL MyProxy server; your ~/.gangarc file should contain MyProxyServer = lcgrbp01.gridpp.rl.ac.uk
- Submit your job via Ganga.
For more details see this document (https://edms.cern.ch/file/722398/1.2/gLite-3-UserGuide.html#SECTION00066300000000000000) (Section on Proxy Renewal).
Disconnecting and Reconnecting
Starting Up Again
Ganga keeps all state information about your jobs in ~/gangadir. When you restart Ganga it will reread the last state and take appropriate actions (querying running job statuses, downloading outputs, etc.). However, it will have forgotten local names for your jobs, but you can reset these using the jobs object, which contains all of your jobs.
svr020:~$ ganga
*** Welcome to Ganga ***
Version: Ganga-4-4-1
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.
This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.
Ganga.GPIDev.Lib.JobRegistry : INFO Found 3 jobs in jobs
Ganga.GPIDev.Lib.JobRegistry : INFO Found 0 jobs in templates
In [1]:jobs
Out[1]: Statistics: 3 jobs
--------------
# id status name subjobs application backend backend.actualCE
# 0 failed Executable LCG svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcg
# 1 completed Executable LCG svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcg
# 2 new Executable LCG
In [2]: myJob=jobs(2)
In [3]: myJob.submit()
...
Screen
It's also possible to run your ganga session in screen, which allows you to disconnect and logout, while ganga still runs. You can then reconnect when you log back in (possibly from a different machine). There's a nice screen tutorial here (http://jmcpherson.org/screen.html). N.B. to reattach to a screen running on svr020 use:
screen -r
Simple Job Management
There are various commands available to the user to manage jobs. These are available generally on the Job itself or directly from the jobs. Here is an example of removing a completed/failed job and killing a running job.
jobs Out[34]: Job slice: jobs (4 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 6 completed Executable LCG svr026.gla.scotgrid.ac.uk:2119/jobmanager-lcg # 7 completed 3 Executable LCG # 8 failed 3 Executable Local # 9 completed 3 Executable Local In [35]:jobs(8).remove() Ganga.GPIDev.Lib.Job : INFO removing job 8 In [36]:jobs(9).remove() Ganga.GPIDev.Lib.Job : INFO removing job 9 In [37]:jobs Out[37]: Job slice: jobs (2 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 6 completed Executable LCG svr026.gla.scotgrid.ac.uk:2119/jobmanager-lcg # 7 completed 3 Executable LCG In [1]:jobs Out[1]: Job slice: jobs (2 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 6 completed Executable LCG svr026.gla.scotgrid.ac.uk:2119/jobmanager-lcg # 7 completed 3 Executable LCG In [2]:jobs(6).resubmit() Ganga.GPIDev.Lib.Job : INFO resubmitting job 6 Ganga.GPIDev.Adapters : INFO resubmitting job 6 to LCG backend In [3]:jobs Out[3]: Job slice: jobs (2 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 6 submitted Executable LCG # 7 completed 3 Executable LCG In [4]:jobs(6).kill() Ganga.GPIDev.Lib.Job : INFO killing job 6 Ganga.GPIDev.Lib.Job : INFO job 6 status changed to "killed" In [5]:jobs Out[5]: Job slice: jobs (2 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 6 killed Executable LCG # 7 completed 3 Executable LCG
Categories: ScotGrid | Ganga | Glasgow
