Belle II Software development
Batch Class Reference
Inheritance diagram for Batch:
Backend HTCondor LSF PBS

Public Member Functions

 __init__ (self, *, backend_args=None)
 
 can_submit (self, *args, **kwargs)
 
 submit (self, job, check_can_submit=True, jobs_per_check=100)
 
 get_batch_submit_script_path (self, job)
 
 get_submit_script_path (self, job)
 

Public Attributes

int global_job_limit = self.default_global_job_limit
 The active job limit.
 
int sleep_between_submission_checks = self.default_sleep_between_submission_checks
 Seconds we wait before checking if we can submit a list of jobs.
 
dict backend_args = {**self.default_backend_args, **backend_args}
 The backend args that will be applied to jobs unless the job specifies them itself.
 

Static Public Attributes

list submission_cmds = []
 Shell command to submit a script, should be implemented in the derived class.
 
int default_global_job_limit = 1000
 Default global limit on the total number of submitted/running jobs that the user can have.
 
int default_sleep_between_submission_checks = 30
 Default time betweeon re-checking if the active jobs is below the global job limit.
 
str submit_script = "submit.sh"
 Default submission script name.
 
str exit_code_file = "__BACKEND_CMD_EXIT_STATUS__"
 Default exit code file name.
 
dict default_backend_args = {}
 Default backend_args.
 

Protected Member Functions

 _add_batch_directives (self, job, file)
 
 _make_submit_file (self, job, submit_file_path)
 
 _submit_to_batch (cls, cmd)
 
 _create_job_result (cls, job, batch_output)
 
 _create_cmd (self, job)
 
 _add_wrapper_script_setup (self, job, batch_file)
 
 _add_wrapper_script_teardown (self, job, batch_file)
 
 _create_parent_job_result (cls, parent)
 

Static Protected Member Functions

 _add_setup (job, batch_file)
 

Detailed Description

Abstract Base backend for submitting to a local batch system. Batch system specific commands should be implemented
in a derived class. Do not use this class directly!

Definition at line 1137 of file backends.py.

Constructor & Destructor Documentation

◆ __init__()

__init__ ( self,
* ,
backend_args = None )
Init method for Batch Backend. Does some basic default setup.

Definition at line 1160 of file backends.py.

1160 def __init__(self, *, backend_args=None):
1161 """
1162 Init method for Batch Backend. Does some basic default setup.
1163 """
1164 super().__init__(backend_args=backend_args)
1165
1167 self.global_job_limit = self.default_global_job_limit
1168
1170 self.sleep_between_submission_checks = self.default_sleep_between_submission_checks
1171

Member Function Documentation

◆ _add_batch_directives()

_add_batch_directives ( self,
job,
file )
protected
Should be implemented in a derived class to write a batch submission script to the job.working_dir.
You should think about where the stdout/err should go, and set the queue name.

Reimplemented in HTCondor, LSF, and PBS.

Definition at line 1172 of file backends.py.

1172 def _add_batch_directives(self, job, file):
1173 """
1174 Should be implemented in a derived class to write a batch submission script to the job.working_dir.
1175 You should think about where the stdout/err should go, and set the queue name.
1176 """
1177 raise NotImplementedError("Need to implement a _add_batch_directives(self, job, file) "
1178 f"method in {self.__class__.__name__} backend.")
1179

◆ _add_setup()

_add_setup ( job,
batch_file )
staticprotectedinherited
Adds setup lines to the shell script file.

Definition at line 807 of file backends.py.

807 def _add_setup(job, batch_file):
808 """
809 Adds setup lines to the shell script file.
810 """
811 for line in job.setup_cmds:
812 print(line, file=batch_file)
813

◆ _add_wrapper_script_setup()

_add_wrapper_script_setup ( self,
job,
batch_file )
protectedinherited
Adds lines to the submitted script that help with job monitoring/setup. Mostly here so that we can insert
`trap` statements for Ctrl-C situations.

Definition at line 814 of file backends.py.

814 def _add_wrapper_script_setup(self, job, batch_file):
815 """
816 Adds lines to the submitted script that help with job monitoring/setup. Mostly here so that we can insert
817 `trap` statements for Ctrl-C situations.
818 """
819 start_wrapper = f"""# ---
820# trap ctrl-c and call ctrl_c()
821trap '(ctrl_c 130)' SIGINT
822trap '(ctrl_c 143)' SIGTERM
823
824function write_exit_code() {{
825 echo "Writing $1 to exit status file"
826 echo "$1" > {self.exit_code_file}
827 exit $1
828}}
829
830function ctrl_c() {{
831 trap '' SIGINT SIGTERM
832 echo "** Trapped Ctrl-C **"
833 echo "$1" > {self.exit_code_file}
834 exit $1
835}}
836# ---"""
837 print(start_wrapper, file=batch_file)
838

◆ _add_wrapper_script_teardown()

_add_wrapper_script_teardown ( self,
job,
batch_file )
protectedinherited
Adds lines to the submitted script that help with job monitoring/teardown. Mostly here so that we can insert
an exit code of the job cmd being written out to a file. Which means that we can know if the command was
successful or not even if the backend server/monitoring database purges the data about our job i.e. If PBS
removes job information too quickly we may never know if a job succeeded or failed without some kind of exit
file.

Definition at line 839 of file backends.py.

839 def _add_wrapper_script_teardown(self, job, batch_file):
840 """
841 Adds lines to the submitted script that help with job monitoring/teardown. Mostly here so that we can insert
842 an exit code of the job cmd being written out to a file. Which means that we can know if the command was
843 successful or not even if the backend server/monitoring database purges the data about our job i.e. If PBS
844 removes job information too quickly we may never know if a job succeeded or failed without some kind of exit
845 file.
846 """
847 end_wrapper = """# ---
848write_exit_code $?"""
849 print(end_wrapper, file=batch_file)
850

◆ _create_cmd()

_create_cmd ( self,
job )
protected
 

Reimplemented in HTCondor, LSF, and PBS.

Definition at line 1352 of file backends.py.

1352 def _create_cmd(self, job):
1353 """
1354 """
1355
1356

◆ _create_job_result()

_create_job_result ( cls,
job,
batch_output )
protected
 

Reimplemented in HTCondor, LSF, and PBS.

Definition at line 1347 of file backends.py.

1347 def _create_job_result(cls, job, batch_output):
1348 """
1349 """
1350

◆ _create_parent_job_result()

_create_parent_job_result ( cls,
parent )
protectedinherited
We want to be able to call `ready()` on the top level `Job.result`. So this method needs to exist
so that a Job.result object actually exists. It will be mostly empty and simply updates subjob
statuses and allows the use of ready().

Reimplemented in HTCondor, Local, LSF, and PBS.

Definition at line 852 of file backends.py.

852 def _create_parent_job_result(cls, parent):
853 """
854 We want to be able to call `ready()` on the top level `Job.result`. So this method needs to exist
855 so that a Job.result object actually exists. It will be mostly empty and simply updates subjob
856 statuses and allows the use of ready().
857 """
858 raise NotImplementedError
859

◆ _make_submit_file()

_make_submit_file ( self,
job,
submit_file_path )
protected
Useful for the HTCondor backend where a submit is needed instead of batch
directives pasted directly into the submission script. It should be overwritten
if needed.

Reimplemented in HTCondor.

Definition at line 1180 of file backends.py.

1180 def _make_submit_file(self, job, submit_file_path):
1181 """
1182 Useful for the HTCondor backend where a submit is needed instead of batch
1183 directives pasted directly into the submission script. It should be overwritten
1184 if needed.
1185 """
1186

◆ _submit_to_batch()

_submit_to_batch ( cls,
cmd )
protected
Do the actual batch submission command and collect the output to find out the job id for later monitoring.

Reimplemented in HTCondor, LSF, and PBS.

Definition at line 1189 of file backends.py.

1189 def _submit_to_batch(cls, cmd):
1190 """
1191 Do the actual batch submission command and collect the output to find out the job id for later monitoring.
1192 """
1193

◆ can_submit()

can_submit ( self,
* args,
** kwargs )
Should be implemented in a derived class to check that submitting the next job(s) shouldn't fail.
This is initially meant to make sure that we don't go over the global limits of jobs (submitted + running).

Returns:
    bool: If the job submission can continue based on the current situation.

Reimplemented in HTCondor, LSF, and PBS.

Definition at line 1194 of file backends.py.

1194 def can_submit(self, *args, **kwargs):
1195 """
1196 Should be implemented in a derived class to check that submitting the next job(s) shouldn't fail.
1197 This is initially meant to make sure that we don't go over the global limits of jobs (submitted + running).
1198
1199 Returns:
1200 bool: If the job submission can continue based on the current situation.
1201 """
1202 return True
1203

◆ get_batch_submit_script_path()

get_batch_submit_script_path ( self,
job )
Construct the Path object of the script file that we will submit using the batch command.
For most batch backends this is the same script as the bash script we submit.
But for some they require a separate submission file that describes the job.
To implement that you can implement this function in the Backend class.

Reimplemented in HTCondor.

Definition at line 1336 of file backends.py.

1336 def get_batch_submit_script_path(self, job):
1337 """
1338 Construct the Path object of the script file that we will submit using the batch command.
1339 For most batch backends this is the same script as the bash script we submit.
1340 But for some they require a separate submission file that describes the job.
1341 To implement that you can implement this function in the Backend class.
1342 """
1343 return Path(job.working_dir, self.submit_script)
1344

◆ get_submit_script_path()

get_submit_script_path ( self,
job )
inherited
Construct the Path object of the bash script file that we will submit. It will contain
the actual job command, wrapper commands, setup commands, and any batch directives

Definition at line 860 of file backends.py.

860 def get_submit_script_path(self, job):
861 """
862 Construct the Path object of the bash script file that we will submit. It will contain
863 the actual job command, wrapper commands, setup commands, and any batch directives
864 """
865 return Path(job.working_dir, self.submit_script)
866
867

◆ submit()

submit ( self,
job,
check_can_submit = True,
jobs_per_check = 100 )
 

Reimplemented from Backend.

Definition at line 1205 of file backends.py.

1205 def submit(self, job, check_can_submit=True, jobs_per_check=100):
1206 """
1207 """
1208 raise NotImplementedError("This is an abstract submit(job) method that shouldn't have been called. "
1209 "Did you submit a (Sub)Job?")
1210

Member Data Documentation

◆ backend_args

dict backend_args = {**self.default_backend_args, **backend_args}
inherited

The backend args that will be applied to jobs unless the job specifies them itself.

Definition at line 797 of file backends.py.

◆ default_backend_args

dict default_backend_args = {}
staticinherited

Default backend_args.

Definition at line 789 of file backends.py.

◆ default_global_job_limit

int default_global_job_limit = 1000
static

Default global limit on the total number of submitted/running jobs that the user can have.

This limit will not affect the total number of jobs that are eventually submitted. But the jobs won't actually be submitted until this limit can be respected i.e. until the number of total jobs in the Batch system goes down. Since we actually submit in chunks of N jobs, before checking this limit value again, this value needs to be a little lower than the real batch system limit. Otherwise you could accidentally go over during the N job submission if other processes are checking and submitting concurrently. This is quite common for the first submission of jobs from parallel calibrations.

Note that if there are other jobs already submitted for your account, then these will count towards this limit.

Definition at line 1156 of file backends.py.

◆ default_sleep_between_submission_checks

int default_sleep_between_submission_checks = 30
static

Default time betweeon re-checking if the active jobs is below the global job limit.

Definition at line 1158 of file backends.py.

◆ exit_code_file

str exit_code_file = "__BACKEND_CMD_EXIT_STATUS__"
staticinherited

Default exit code file name.

Definition at line 787 of file backends.py.

◆ global_job_limit

int global_job_limit = self.default_global_job_limit

The active job limit.

This is 'global' because we want to prevent us accidentally submitting too many jobs from all current and previous submission scripts.

Definition at line 1167 of file backends.py.

◆ sleep_between_submission_checks

int sleep_between_submission_checks = self.default_sleep_between_submission_checks

Seconds we wait before checking if we can submit a list of jobs.

Only relevant once we hit the global limit of active jobs, which is a lot usually.

Definition at line 1170 of file backends.py.

◆ submission_cmds

list submission_cmds = []
static

Shell command to submit a script, should be implemented in the derived class.

Definition at line 1143 of file backends.py.

◆ submit_script

submit_script = "submit.sh"
staticinherited

Default submission script name.

Definition at line 785 of file backends.py.


The documentation for this class was generated from the following file: