Submitting User Jobs to the grid
Introduction
When your anamipp jobs get to long to run on your local workstation
or you just crave more statistics for your latest analysis you may
want to run process data on the grid, getting results for tens or hundreds
of runs in the same time it takes to do one run on your computer.
gridSubmitUserJob is a script that lets you do this. It is
mentioned on the grid page. But that page lacks
step by step instructions, so here they are.
When to run on the grid
Of course you don't have to run on the grid if the dsts contain enough
information for your analysis. Submit gridUserJobs if you need to
re-reconstruct some of the data or get to the hit information.
If you want to run e907mc jobs, you can use gridSubmitE907MC. It can
write to both volatile or tape-backed dCache. Most of the information here
applies to gridSubmitE907MC, too.
What a gridUserJob does
A gridUserJob runs anamipp with your custom job description on runs you
specify producing histogram/dst output in volatile dCache.
Once the files are there, you can directly use them or dccp them to local
disk space.
You can run on MC runs, but you may have to tweak the script for
non-standard file locations.
The gridUserJob runs on the grid, so it will use a grid-release of the MIPP
software. A grid-release is just a tar-ball of all the MIPP
executables, shared libraries, Geometry-, and Bfield-files.
gridUserJobs can run on any site in the Open Science Grid. The
instructions here are tailored for the fngp-osg cluster, the FermiNational
GeneralPurpose - OpenScienceGrid cluster. Running on other sites is
similar.
Step by Step
These instructions have a lot of detail. This is actually much easier than
the length of this page may cause you to believe.
- Get ready to submit the job:
- Decide what release you want to use:
A tagged release allows you to work with dsts later.
If you use the development release and your job creates dsts,
the dsts may become unusable once the development release changes.
This is not a problem if you only generate histograms.
An older tagged release may not contain your job description or job
module.
If you need to tag a new release, feel
free to do so.
- Test that your job works:
Run anamipp with the job description you plan to use from the
release you plan to use: setrel myrelease; anamipp -x
myJob.xml -n 100 -i one-of-the-runs-you-want-to-use.root
Fix problems and repeat until there are no problems, i.e. the job
does not crash and your histograms/dsts get filled.
- Put a grid-release from the release you want to use (tagged or
development) into the release directory on the MIPP web site:
If you want to run with a tagged release, e.g. R08.03.14_Linux2.6,
it may be
there already. (If a development release is there, it may
be out of date.)
Otherwise see the details on how to do
this.
For this example we use release R08.03.14_Linux2.6. You may have
development_Linux2.6 instead.
- Make a runlist:
If you want to submit only one or two runs, just remember the run
numbers. Otherwise make a text file with a single run number per
line. I assume below that you call it my-runlist.txt
Copy the runlist to the system you will use to submit the job:
scp my-runlist.txt e907@fnpcsrv1.fnal.gov: You need to have
a Kerberos ticket for this. You also need to be listed in the
.k5login file.
- Submit the job:
You will log in to the head node of the grid cluster (fnpcsrv1) and
execute gridSubmitUserJob.
That script will submit your jobs to the grid worker nodes.
Each worker node will get run anamipp on one run
and copy results to dCache.
- Connect to fnpcsrv1.fnal.gov:
Get Kerberos credentials (kinit, kinit -nf, kinit -Af, or whatever
your OS requires.)
ssh e907@fnpcsrv1.fnal.gov
Need to be in .k5login. Ask to be added if ssh fails.
- Check that the grid-credentials did not expire:
voms-proxy-info
If 'time left' is zero, copy your own x509 certificate to
e907@fnpcsrv1.fnal.gov:.globus/, then voms-proxy-init, or ask
someone else to get a proxy certificate for you.
- Set up the environment to submit the job:
cd ~/testRel
setrel
cd BatchProc/grid
- Make a directory for your output files:
choose a name, e.g. my-grid
srmls srm://fndca1.fnal.gov:8443//pnfs/fnal.gov/usr/e907/fermigrid/volatile/
If my-grid already exists, srmrm old files inside it or
choose a different name.
Make the directory:
srmmkdir srm://fndca1.fnal.gov:8443//pnfs/fnal.gov/usr/e907/fermigrid/volatile/my-grid
- execute gridSubmitUserJob:
You first submit a single job, one run, so that the grid-release
gets copied from the web to the $OSG_APP/mipp area.
(On the fngp-osg cluster this is /grid/app/mipp.)
Then you submit all the other runs:
- gridSubmitUserJob # with no arguments it prints
a help message
- submit a single run:
gridSubmitUserJob (grid-release) (job-description)
(run/runlist) (output-dir)
gridSubmitUserJob R08.03.14_Linux2.6 myJob.xml 16200
fermigrid/volatile/my-grid #substitute your
arguments.
- It will (most likely) complain about a missing
directory.
mkdir /farm/e907_stage01/log/whatever-directory-it-complained-about
- Try again:
gridSubmitUserJob ... The command history has it,
just hit cursor up twice.
If it worked, you should see something like
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 1732774.
- Now wait a few minutes. Then check that the grid-release
got imported and the job is running:
ls /grid/app/mipp/ #should list R08.03.14_Linux2.6
or whatever grid release you specified
lj # list jobs to make sure that your job is
running
ls -l /farm/e907_stage01/log/.....# whatever that
directory you created above
If the log files have size 0 the job is still processing.
If the log files have size>0, check if your output
file was generated, otherwise read the logs for errors
-
submit several runs given in a file of run-numbers:
gridSubmitUserJob R08.03.14_Linux2.6 myJob.xml
my-runlist.txt fermigrid/volatile/my-grid
- Use the results:
- Check what output files have been created:
srmls
srm://fndca1.fnal.gov:8443//pnfs/fnal.gov/usr/e907/fermigrid/volatile/my-grid
- Directly use files in dCache on e907ana* in applications that support
dcap URIs...
(root ${DCAP_VHOME}/my-grid/<filename>)
- ...or copy files to a local disk. The
transferFromdCache.sh script makes this easy. It is part
of the MIPP software.
Test that all your changes work, then run make-release.pl.
The detailed instructions are here.
See here.
Holger Meyer