Belle II Software development
PBS.PBSResult Class Reference
Inheritance diagram for PBS.PBSResult:
Result

Public Member Functions

 __init__ (self, job, job_id)
 
 update_status (self)
 
 ready (self)
 
 get_exit_code_from_file (self)
 

Public Attributes

 job_id = job_id
 job id given by PBS
 
 job = job
 Job object for result.
 
 time_to_wait_for_exit_code_file = timedelta(minutes=5)
 After our first attempt to view the exit code file once the job is 'finished', how long should we wait for it to exist before timing out?
 
 exit_code_file_initial_time = None
 Time we started waiting for the exit code file to appear.
 

Static Public Attributes

dict backend_code_to_status
 PBS statuses mapped to Job statuses.
 

Protected Member Functions

 _update_result_status (self, qstat_output)
 
 _get_status_from_output (self, output)
 

Protected Attributes

bool _is_ready = False
 Quicker way to know if it's ready once it has already been found.
 

Detailed Description

Simple class to help monitor status of jobs submitted by `PBS` Backend.

You pass in a `Job` object (or `SubJob`) and job id from a qsub command.
When you call the `ready` method it runs bjobs to see whether or not the job has finished.

Definition at line 1420 of file backends.py.

Constructor & Destructor Documentation

◆ __init__()

__init__ ( self,
job,
job_id )
Pass in the job object and the job id to allow the result to do monitoring and perform
post processing of the job.

Definition at line 1440 of file backends.py.

1440 def __init__(self, job, job_id):
1441 """
1442 Pass in the job object and the job id to allow the result to do monitoring and perform
1443 post processing of the job.
1444 """
1445 super().__init__(job)
1446
1447 self.job_id = job_id
1448

Member Function Documentation

◆ _get_status_from_output()

_get_status_from_output ( self,
output )
protected
Get status from output

Definition at line 1499 of file backends.py.

1499 def _get_status_from_output(self, output):
1500 """
1501 Get status from output
1502 """
1503 for job_info in output["JOBS"]:
1504 if job_info["Job_Id"] == self.job_id:
1505 return job_info["job_state"]
1506 else:
1507 raise KeyError
1508
1509 def can_submit(self, njobs=1):

◆ _update_result_status()

_update_result_status ( self,
qstat_output )
protected
Parameters:
        qstat_output (dict): The JSON output of a previous call to qstat which we can reuse to find the
        status of this job. Obviously you should only be passing a JSON dict that contains the 'Job_Id' and
        'job_state' information, otherwise it is useless.

Definition at line 1462 of file backends.py.

1462 def _update_result_status(self, qstat_output):
1463 """
1464 Parameters:
1465 qstat_output (dict): The JSON output of a previous call to qstat which we can reuse to find the
1466 status of this job. Obviously you should only be passing a JSON dict that contains the 'Job_Id' and
1467 'job_state' information, otherwise it is useless.
1468
1469 """
1470 try:
1471 backend_status = self._get_status_from_output(qstat_output)
1472 except KeyError:
1473 # If this happens then maybe the job id wasn't in the qstat_output argument because it finished.
1474 # Instead of failing immediately we try looking for the exit code file and then fail if it still isn't there.
1475 B2DEBUG(29, f"Checking of the exit code from file for {self.job}")
1476 try:
1477 exit_code = self.get_exit_code_from_file()
1478 except FileNotFoundError:
1479 waiting_time = datetime.now() - self.exit_code_file_initial_time
1480 if self.time_to_wait_for_exit_code_file > waiting_time:
1481 B2ERROR(f"Exit code file for {self.job} missing and we can't wait longer. Setting exit code to 1.")
1482 exit_code = 1
1483 else:
1484 B2WARNING(f"Exit code file for {self.job} missing, will wait longer.")
1485 return
1486 if exit_code:
1487 backend_status = "E"
1488 else:
1489 backend_status = "C"
1490
1491 try:
1492 new_job_status = self.backend_code_to_status[backend_status]
1493 except KeyError as err:
1494 raise BackendError(f"Unidentified backend status found for {self.job}: {backend_status}") from err
1495
1496 if new_job_status != self.job.status:
1497 self.job.status = new_job_status
1498

◆ get_exit_code_from_file()

get_exit_code_from_file ( self)
inherited
Read the exit code file to discover the exit status of the job command. Useful fallback if the job is no longer
known to the job database (batch system purged it for example). Since some backends may take time to download
the output files of the job back to the working directory we use a time limit on how long to wait.

Definition at line 908 of file backends.py.

908 def get_exit_code_from_file(self):
909 """
910 Read the exit code file to discover the exit status of the job command. Useful fallback if the job is no longer
911 known to the job database (batch system purged it for example). Since some backends may take time to download
912 the output files of the job back to the working directory we use a time limit on how long to wait.
913 """
914 if not self.exit_code_file_initial_time:
915 self.exit_code_file_initial_time = datetime.now()
916 exit_code_path = Path(self.job.working_dir, Backend.exit_code_file)
917 with open(exit_code_path) as f:
918 exit_code = int(f.read().strip())
919 B2DEBUG(29, f"Exit code from file for {self.job} was {exit_code}")
920 return exit_code
921
922

◆ ready()

ready ( self)
inherited
Returns whether or not this job result is known to be ready. Doesn't actually change the job status. Just changes
the 'readiness' based on the known job status.

Definition at line 887 of file backends.py.

887 def ready(self):
888 """
889 Returns whether or not this job result is known to be ready. Doesn't actually change the job status. Just changes
890 the 'readiness' based on the known job status.
891 """
892 B2DEBUG(29, f"Calling {self.job}.result.ready()")
893 if self._is_ready:
894 return True
895 elif self.job.status in self.job.exit_statuses:
896 self._is_ready = True
897 return True
898 else:
899 return False
900

◆ update_status()

update_status ( self)
Update the job's (or subjobs') status by calling qstat.

Reimplemented from Result.

Definition at line 1449 of file backends.py.

1449 def update_status(self):
1450 """
1451 Update the job's (or subjobs') status by calling qstat.
1452 """
1453 B2DEBUG(29, f"Calling {self.job}.result.update_status()")
1454 # Get all jobs info and reuse it for each status update to minimise tie spent on this updating.
1455 qstat_output = PBS.qstat()
1456 if self.job.subjobs:
1457 for subjob in self.job.subjobs.values():
1458 subjob.result._update_result_status(qstat_output)
1459 else:
1460 self._update_result_status(qstat_output)
1461

Member Data Documentation

◆ _is_ready

bool _is_ready = False
protectedinherited

Quicker way to know if it's ready once it has already been found.

Saves a lot of calls to batch system commands.

Definition at line 880 of file backends.py.

◆ backend_code_to_status

dict backend_code_to_status
static
Initial value:
= {"R": "running",
"C": "completed",
"FINISHED": "completed",
"E": "failed",
"H": "submitted",
"Q": "submitted",
"T": "submitted",
"W": "submitted",
"H": "submitted"
}

PBS statuses mapped to Job statuses.

Definition at line 1429 of file backends.py.

◆ exit_code_file_initial_time

exit_code_file_initial_time = None
inherited

Time we started waiting for the exit code file to appear.

Definition at line 885 of file backends.py.

◆ job

job = job
inherited

Job object for result.

Definition at line 878 of file backends.py.

◆ job_id

job_id = job_id

job id given by PBS

Definition at line 1447 of file backends.py.

◆ time_to_wait_for_exit_code_file

time_to_wait_for_exit_code_file = timedelta(minutes=5)
inherited

After our first attempt to view the exit code file once the job is 'finished', how long should we wait for it to exist before timing out?

Definition at line 883 of file backends.py.


The documentation for this class was generated from the following file: