Batch submission

3.6.4. Batch submission#

When a batch job is submitted from a work server at KEKCC, the job is scheduled by LSF (Platform Load Sharing Facility developed by IBM) which dispatches and executes the job on a calculation server. It is important to select an appropriate queue for your jobs.

In this lesson, we will go through some commands that are often used in analysis.

Basic commands#

Displays information about batch queues

It is important to know which queues can be used and what is the workload of the queue.

To display the information about all batch queues:

bqueues [-u $USER]

If no option is given, this returns the following information about all queues: queue name, queue priority, queue status, task statistics, and job state statistics.

$ bqueues -u $USER
QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP
s               120  Open:Active    3200  800    -    - 28126 24927  3199     0
b_index         110  Open:Active     600  100    -    -     0     0     0     0
b_nagoya        110  Open:Active     600  100    -    -     0     0     0     0
l               100  Open:Active       - 1200    -    - 42806 35090  7716     0
h               100  Open:Active    1200  200    -    -  1233   629   604     0
p               100  Open:Active    1200  240    -    -     0     0     0     0
b_b             100  Closed:Active     - 1000    -    -     0     0     0     0
a               100  Open:Active       -    4    -    -     0     0     0     0

Different queues have different settings. For analysis you can use s, l, or h. For short jobs with a computing time (CPU time) of under 3 hours, the queue s is preferable. For jobs with execution time more than 3 hours, you might want to use the queue l which gives jobs up to 24 hours of computing time. More information about LSF queues can be found here.

This command also displays the current “Fairshare” values. Fairshare defines the priorities of jobs that are dispatched.

bqueues -l [<queue_name>]

Here the square brackets […] indicate that the argument is optional and <…> indicates that the value should be filled in by you.

Exercise

Check your priorities on queue s.

Solution

bqueues -l s [| grep $USER]

Provide queue name after -l, and combine with grep command to get your information more quickly. If you have never used the batch queue before, it should be 0.333.

Every uses has the default value of 0.333 to start with. The more jobs you submit, the lower your Fairshare is.

Submit a job

With an example script as

#!/usr/bin/bash
echo "Hello world, this is script ${0}." >> batch_output.txt
sleep 20
echo "Finished!" >> batch_output.txt

To submit a job to queue s

bsub -q s "bash example.sh"

and check the output

$ cat batch_output.txt
Hello world, this is script example.sh.
Finished!

Use the same method, you can submit Python or basf2 scripts to bqueues!

bsub -q <queue name> "basf2 <your_working_script>"

Note

Always test your script before submitting large scale jobs to batch system.

Display job status

To check the job status

bjobs [-q <queue name>] [<job_ID>]

Exercise

Submit a basf2 job to queue l, and then check the status of your jobs.

Hint

A simple basf2 job could be the following:

# Print all variables known to the variable manager
from variables import printVars
printVars()

Solution

Submission:

$ bsub -q l "basf2 one_of_example.py"
Job <xxxxxxxx> is submitted to queue <l>.

To check the status, use one of the following:

bjobs -q l <xxxxxxxx>, bjobs <xxxxxxxx>, or just bjobs alone.

Cancel a job

To cancel jobs

bkill [<job_ID>]

Note

Use 0 to kill all jobs. Use this with caution.

Sometimes bjobs will still show the job after we tried to terminate it. In this case we can use the -r option to kill it by force. More information is given here.

Optional#

Now that you’re familiar with the basics, let’s go over some commands/options that would be useful, but situational.

Suspend jobs

In some scenarios you might want to stop the submitted jobs and resume them later. For instance this might be due to scheduled maintenance of storage elements where the input data is located or the updating of analysis global tags that used in your jobs.

To suspend unfinished jobs

bstop <job_ID>

Note

Use -a to suspend all jobs.

Resume jobs

To resumes suspended jobs

bresume <job_ID>

Large memory usage

In addition, you might have jobs that require more than 4GB of memory. In that case, use the bsub option -n “parallel number X” to give you 4GB × X amount of memory.

To have 16GB of memory on the short job queue

bsub -q s -n 4 "bash example.sh"

Saving job output

Finally, it would probably be a good idea to have the output of your LSF jobs into a log file. The relevant bsub option is -o (standard output) and -e (standard error).

To have 16GB of memory on the short job queue with a log file

bsub -q s -n 4 -o logfile.out -e errorfile.err "bash example.sh"

Key points

  • Submit a script to the short queue with bsub -q s "bash myscript.sh"

  • Check job queues with bequeues

  • Kill jobs with bkill <job id>

  • Always test your scripts before large scale submissions!

Stuck? We can help!

If you get stuck or have any questions to the online book material, the #starterkit-workshop channel in our chat is full of nice people who will provide fast help.

Refer to Collaborative Tools. for other places to get help if you have specific or detailed questions about your own analysis.

Improving things!

If you know how to do it, we recommend you to report bugs and other requests with GitLab. Make sure to use the documentation-training label of the basf2 project.

If you just want to give us feedback, please open a GitLab issue and add the label online_book to it.

Please make sure to be as precise as possible to make it easier for us to fix things! So for example:

  • typos (where?)

  • missing bits of information (what?)

  • bugs (what did you do? what goes wrong?)

  • too hard exercises (which one?)

  • etc.

If you are familiar with git and want to create your first merge request for the software, take a look at How to contribute. We’d be happy to have you on the team!

Quick feedback!

Do you want to leave us some feedback? Open a GitLab issue and add the label online_book to it.

Author of this lesson

Chia-Ling Hsu, Tommy Lam