10.2.1. Prompt Calibration Tools#

Overview#

The prompt calibration is a set of processes that need to happen before first mDST recontsruction on raw data can proceed. A run range must be defined and all necessary calibration payloads must be created and uploaded to the correct conditions database global tag. This procedure can be quite complex since:

  • Local calibrations occur outside the scope of any automated system, as the data is often inaccessible. They may have scripts that are very different to each other, and which require expert knowledge.

  • Calibrations from different sub-detectors may depend on each other but do not want to be included in a single CAF job.

  • Calibrations may require the production of cDST files from raw data before they can begin. But cDST production may also depend on earlier (tracking) calibrations.

  • Experts may create very complex CAF scripts which require different inputs and setup. The interface to them is generally not standardised, other than using the CAF.

  • These CAF scripts are also not often committed to basf2 and are kept private. This means that there is usually no record of the correct scripts to use for a basf2 release.

Automated systems for controlling the creation, submission, and monitoring of CAF jobs for prompt calibration are being created. In order to use them effectively some tools have been created in basf2:

  • A directory for official CAF scripts for prompt calibration, to keep them up to date with basf2 releases.

  • A standardised structure for these scripts so that an automatic system can discover available calibration scripts for a basf2 release.

  • Command line tools that the automatic system will use to run these scripts. Developers can also use these tools to check that their scripts will run in the automatic system.

Note

This is to be used for prompt calibration only. You do not need to follow this guide for reprocessings of very large amounts of old data.

Getting Started#

Prompt Calibration Scripts#

First Steps:

  1. Create a CalibrationCollectorModule (if a new one is necessary), and a CalibrationAlgorithm.

  2. Create a working CAF calibration python script using the Calibration class.

At this point you are working with the CAF and should be ready to add a prompt calibration script to basf2 and test it. Now identify the requirements of your calibration:

  • Which data formats will it take as input (raw, cDST)?

  • What kinds of valid calibration data will it take as input e.g. HLT hadron skim

  • How many events from a run/in total will it need?

  • Does this calibration depend on the accurate payloads from another prompt calibration?

If you need to create a new script for your calibration you should create it in the calibration/scripts/prompt/calibrations directory. These scripts have a standard format (see below) which you must use or your prompt calibration won’t work!

If you decide that your calibration should run within the same overall job as another related calibration, you should identify which one in the calibration/scripts/prompt/calibrations directory. Then add it to the returned calibrations list in the get_calibrations function below.

The format for a prompt calibration script is:

"""Docstring explaining what this calibration script does. Maybe a description of payloads?
This can be a long as you want."""

##############################
# OPTIONAL DEPENDENCY #
##############################

# you may choose to signal that this calibration depends on others by importing their settings and
# adding it to the 'depends_on' list in this calibration's settings variable.

from prompt.calibrations.caf_vxd import settings as caf_vxd_dependency

##############################
# REQUIRED VARIABLE #
##############################

from prompt import CalibrationSettings

settings = CalibrationSettings(name="Example Simple",
                               expert_username="ddossett",
                               description=__doc__,
                               input_data_formats=["raw"],
                               input_data_names=["physics"],
                               input_data_filters={"physics":["physics","Good"]}
                               expert_config={"events_per_file": 10000, ...},
                               depends_on=[caf_vxd_dependency])

##############################
# REQUIRED FUNCTION #
##############################
def get_calibrations(input_data, **kwargs):
    # The only function that MUST exist in this module. It should return a LIST of Calibration objects
    # that have had their input files assigned and any configuration applied. The final output payload IoV(s)
    # should also be set correctly to be open-ended e.g. IoV(exp_low, run_low, -1, -1)
    #
    # The database_chain, CAF backend_args, backend, and heartbeat of these
    # calibrations will all be set by the b2caf-prompt-run tool.
    # You should set the `Calibration.max_subjobs` attribute yourself, probably to ~1500. But if you have many
    # Calibration objects running in parallel you might want to set  it lower per Calibration so that the total
    # number of a jobs from all parallel Calibrations is ~1000 -> 2500
    
    # The expert_config received here will be the combination of your defaults in the settings variable above,
    # with any values set in your caf_config.json taking precedence.
    expert_config = kwargs.get("expert_config")

    mycal1 = Calibration("Example1")
    mycal2 = Calibration("Example1")

    ...

    return [mycal1, mycal2]

As you can see you need to define only two things at the top-level of the script. A settings variable and a get_calibrations(input_data, **kwargs) function. You are free to import from other basf2 packages and create more variables and functions in this script. But these two must exist.

The settings variable#

This must be a variable of the type prompt.CalibrationSettings. It defines the input data requirements for the script and gives a human readable name and description. It also defines the contact expert username (which should be same as the one in GitLab), and a list of other prompt calibrations that the script depends on. This list will be used to define the task order in the automatic system, it will not affect running the script standalone.

Warning

If you encounter an ImportError when running your script. Please check that you haven’t created a circular dependency by setting your depends_on calibrations to a calibration that depends on yours. This dependency may well be implicit in the chain of dependencies (A -> B -> C means C depends on A), rather than explicit (A -> C means C depends on A).

It may also be the case that you need to run scons again to make your new prompt/calibrations/ script known to basf2.

class prompt.CalibrationSettings(name, expert_username, description, input_data_formats=None, input_data_names=None, input_data_filters=None, depends_on=None, expert_config=None)[source]#

Simple class to hold and display required information for a prompt calibration script (process).

Parameters:
  • name (str) – The unique calibration name, not longer than 64 characters.

  • expert_username (str) – The GitLab username of the expert to contact about this script. This username will be used to assign the default responsible person for submitting and checking prompt calibration jobs.

  • description (str) – Long form description of the calibration and what it does. Feel free to make this as long as you need.

  • input_data_formats (frozenset(str)) – The data formats {‘raw’, ‘cdst’, ‘mdst’, ‘udst’} of the input files that should be used as input to the process. Used to figure out if this calibration should occur before the relevant data production e.g. before cDST files are created.

  • input_data_names (frozenset(str)) – The names that you will use when accessing the input data given to the prompt calibration process i.e. Use these in the get_calibrations function to access the correct input data files. e.g. input_data_names=[“all_events”, “offres_photon_events”]

  • input_data_filters (dict) – The data selection for the data input names, used for automated calibration. The keys should correspond to one of the input_data_names with the values being a list of the various data filters, e.g. Data Tag, Beam Energy, Run Type, Run Quality Tag and Magnet. All available filters can be found in the input_data_filters dictionary e.g. from prompt import input_data_filters with details about data tags and run quality tags found at: https://calibration.belle2.org/belle2/data_tags/list/. To exclude specific filters, pre-append with NOT e.g. {“all_events”: [“mumu_tight_or_highm_calib”, “hadron_calib”, “Good”, “On”], “offres_photon_events”: [“gamma_gamma_calib”, “Good”, “NOT On”]}. Not selecting a specific filters (e.g. Magnet) is equivalent to not having any requirements, e.g. (Either)

  • depends_on (list(CalibrationSettings)) – The settings variables of the other prompt calibrations that you want want to depend on. This will allow the external automatic system to understand the overall ordering of scripts to run. If you encounter an import error when trying to run your prompt calibration script, it is likely that you have introduced a circular dependency.

  • expert_config (dict) – Default expert configuration for this calibration script. This is an optional dictionary (which must be JSON compliant) of configuration options for your get_calibrations(…) function. This is supposed to be used as a catch-all place to send in options for your calibration setup. For example, you may want to have an optional list of IoV boundaries so that your prompt script knows that it should split the input data between different IoV ranges. Or you might want to send if options like the maximum events per input file to process. The value in your settings object will be the default, but you can override the value via the caf_config.json sent into b2caf-prompt-run.

The get_calibrations function#

This function must have the format get_calibrations(input_data, **kwargs). The input_data argument is a a dictionary where the keys should be the same as the values for input_data_names in your settings variable. input_data is filled automatically by the b2caf-prompt-run tool from the input data JSON file. The format of input_data is:

input_data = {"Your_Input_Name_A":
                  {"/path/to/type/a/input/file_1_1.root": caf.utils.IoV(1, 1, 1, 1),
                   "/path/to/type/a/input/file_1_2.root": caf.utils.IoV(1, 2, 1, 2),
                   ...
                  },
              "Your_Input_Name_B":
                  {"/path/to/type/b/input/file_1_1.root": caf.utils.IoV(1, 1, 1, 1),
                   "/path/to/type/b/input/file_1_2.root": caf.utils.IoV(1, 2, 1, 2),
                   ...
                  },
             }

So input_data is a dictionary that contains input files separated into categories, and each file has an associated IoV object telling you which Experiment and Run this file comes from.

Warning

You are not forced to use every file or run given as input. You can always filter/reduce the number of input files to a more manageable amount depending on how much data you expect to need per run (or in total).

The **kwargs argument is used to send in the requested_iov and expert_config values at the moment. requested_iov value is the overall bucket IoV, and is the run range that your output payloads should cover.

Note

Although kwargs["requested_iov"] has both a defined lower and upper bound e.g. IoV(2, 1, 2, 100). For prompt processing you should endeavour to have your output payloads be open-ended e.g. IoV(2, 1, -1, -1).

Note

We use **kwargs so that in the future if we change what is being sent into get_calibrations we won’t have to worry about adding a new argument to the function in every script.

Command Line Tools#

b2caf-prompt-show#

Prints details about the available prompt calibration (default) or validation scripts.

usage: b2caf-prompt-show [-h] [--script-name] [--validation] [--local]
                         [--json]

options

--script-name

The prompt script to print details about. If this is not given, a list of available scripts will instead be printed.

--validation

Shows the validation scripts instead of the calibration scripts

--local

Shows the settings of the local calibrations instead of the normal calibrations

--json

Script details will be printed as a JSON string instead of nice stdout. Used for automatic collection of calibration script details.

b2caf-prompt-run#

usage: b2caf-prompt-run [-h] {Local,LSF,PBS,HTCondor} ...

Sub-commands:#

Local#

Runs the jobs using the Local backend i.e. local multiprocessing.

b2caf-prompt-run Local [-h] [--max-processes] [--log-level] [--debug-level]
                       [--heartbeat]
                       [--max-files-per-subjob  | --max-subjobs ] [--dry-run]
                       [--overwrite-output-db] [--overwrite-working-dir]
                       [--permissive]
                       caf_config input_data

Required Arguments

caf_config

Path to config JSON file, used to set up the expert configuration, CAF and calibrations used in this CAF process.

Required format of JSON file:

{
 "caf_script": (str),
 "database_chain" array[str],
 "backend_args" dict,     <- Optional as the backend default values and/or values in the prompt script will be used otherwise
 "requested_iov" array[4](int),
 "expert_config" dict,     <- Optional as the prompt script default will be used if this isn't set here.
 "testing_payloads" str    <- Optional, by default do not add testing payloads, possible only with --permissive
}

The backend_args dictionary will be used to set up the caf.backends.Backend class and overrides the defaults of that class. If you set backend options via the command line e.g. --queue l, then this will override backend_args values in this JSON file. Individual caf.framework.Collection objects can also override these options by setting them. So the final priority order is (lowest -> highest): [Backend.default_backend_args -> caf_config.json -> b2caf-prompt-run command line options -> Collection.backend_args]

Generally it is best not to set anything in the prompt script itself. Just use the caf_config.json and b2caf-prompt-run options.

input_data

Path to input data json file, used to find input files for the CAF job. Also to create the IoV for each run so the calibrations can use it if necessary.

Note that the input data json file should have the form:

{
 "hlt_mumu": [["/path/to/run/hlt_mumu_2trk/raw", [8, 1977]], ... ]
 "hlt_hadron": [["/path/to/run/hlt_hadron/raw", [8, 1977]], ...]
}

where the key is the same as the one used by the settings variable’s input_data_names in the prompt calibration script you are running.

The values are lists of directory paths and the corresponding (Experiment, Run).

options

--max-processes

Set the multiprocessing Pool size (max concurrent processes). (default: 4)

Default: 4

--log-level

Possible choices: DEBUG, INFO, RESULT, WARNING, ERROR, FATAL

Set the basf2 LogLevel. (default: INFO

Default: “INFO”

--debug-level

Set the DEBUG level value, overrides log-level to be DEBUG.

--heartbeat

Sets the sleep interval (seconds) between attempts to check the readiness of jobs. (default: 60)

Default: 60

--max-files-per-subjob

Sets the number of input files that will be used per subjob.

--max-subjobs

Sets the maximum number of subjobs that will be submitted. Input files will be split as evenly as possible between the subjobs.

--dry-run

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-output-db

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-working-dir

Flags if the tool should delete the CAF working directory (‘calibration_results’) before beginning the processing. This will delete the previous results! Only use if you want a clean start from the beginning again!

--permissive

Flags if the tool can run scripts also in generic paths and testing payloads

LSF#

Runs the jobs using the LSF backend.

b2caf-prompt-run LSF [-h] [--queue] [--global-job-limit]
                     [--submission-check-heartbeat] [--log-level]
                     [--debug-level] [--heartbeat]
                     [--max-files-per-subjob  | --max-subjobs ] [--dry-run]
                     [--overwrite-output-db] [--overwrite-working-dir]
                     [--permissive]
                     caf_config input_data

Required Arguments

caf_config

Path to config JSON file, used to set up the expert configuration, CAF and calibrations used in this CAF process.

Required format of JSON file:

{
 "caf_script": (str),
 "database_chain" array[str],
 "backend_args" dict,     <- Optional as the backend default values and/or values in the prompt script will be used otherwise
 "requested_iov" array[4](int),
 "expert_config" dict,     <- Optional as the prompt script default will be used if this isn't set here.
 "testing_payloads" str    <- Optional, by default do not add testing payloads, possible only with --permissive
}

The backend_args dictionary will be used to set up the caf.backends.Backend class and overrides the defaults of that class. If you set backend options via the command line e.g. --queue l, then this will override backend_args values in this JSON file. Individual caf.framework.Collection objects can also override these options by setting them. So the final priority order is (lowest -> highest): [Backend.default_backend_args -> caf_config.json -> b2caf-prompt-run command line options -> Collection.backend_args]

Generally it is best not to set anything in the prompt script itself. Just use the caf_config.json and b2caf-prompt-run options.

input_data

Path to input data json file, used to find input files for the CAF job. Also to create the IoV for each run so the calibrations can use it if necessary.

Note that the input data json file should have the form:

{
 "hlt_mumu": [["/path/to/run/hlt_mumu_2trk/raw", [8, 1977]], ... ]
 "hlt_hadron": [["/path/to/run/hlt_hadron/raw", [8, 1977]], ...]
}

where the key is the same as the one used by the settings variable’s input_data_names in the prompt calibration script you are running.

The values are lists of directory paths and the corresponding (Experiment, Run).

options

--queue

The batch queue to use. (e.g. s)

--global-job-limit

The number of batch jobs that can be active for the user before the backend class will stop submitting. This is not a completely hard limit, the actual max reached depends on other submitting processes and the number submitted before re-checking. (default: 1000)

Default: 1000

--submission-check-heartbeat

The time (seconds) between checking if there are fewer batch jobs than the global limit. Generally not needed to change, but it certainly shouldn’t be set lower than 30 seconds. (default: 30)

Default: 30

--log-level

Possible choices: DEBUG, INFO, RESULT, WARNING, ERROR, FATAL

Set the basf2 LogLevel. (default: INFO

Default: “INFO”

--debug-level

Set the DEBUG level value, overrides log-level to be DEBUG.

--heartbeat

Sets the sleep interval (seconds) between attempts to check the readiness of jobs. (default: 60)

Default: 60

--max-files-per-subjob

Sets the number of input files that will be used per subjob.

--max-subjobs

Sets the maximum number of subjobs that will be submitted. Input files will be split as evenly as possible between the subjobs.

--dry-run

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-output-db

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-working-dir

Flags if the tool should delete the CAF working directory (‘calibration_results’) before beginning the processing. This will delete the previous results! Only use if you want a clean start from the beginning again!

--permissive

Flags if the tool can run scripts also in generic paths and testing payloads

PBS#

Runs the jobs using the PBS backend.

b2caf-prompt-run PBS [-h] [--queue] [--global-job-limit]
                     [--submission-check-heartbeat] [--log-level]
                     [--debug-level] [--heartbeat]
                     [--max-files-per-subjob  | --max-subjobs ] [--dry-run]
                     [--overwrite-output-db] [--overwrite-working-dir]
                     [--permissive]
                     caf_config input_data

Required Arguments

caf_config

Path to config JSON file, used to set up the expert configuration, CAF and calibrations used in this CAF process.

Required format of JSON file:

{
 "caf_script": (str),
 "database_chain" array[str],
 "backend_args" dict,     <- Optional as the backend default values and/or values in the prompt script will be used otherwise
 "requested_iov" array[4](int),
 "expert_config" dict,     <- Optional as the prompt script default will be used if this isn't set here.
 "testing_payloads" str    <- Optional, by default do not add testing payloads, possible only with --permissive
}

The backend_args dictionary will be used to set up the caf.backends.Backend class and overrides the defaults of that class. If you set backend options via the command line e.g. --queue l, then this will override backend_args values in this JSON file. Individual caf.framework.Collection objects can also override these options by setting them. So the final priority order is (lowest -> highest): [Backend.default_backend_args -> caf_config.json -> b2caf-prompt-run command line options -> Collection.backend_args]

Generally it is best not to set anything in the prompt script itself. Just use the caf_config.json and b2caf-prompt-run options.

input_data

Path to input data json file, used to find input files for the CAF job. Also to create the IoV for each run so the calibrations can use it if necessary.

Note that the input data json file should have the form:

{
 "hlt_mumu": [["/path/to/run/hlt_mumu_2trk/raw", [8, 1977]], ... ]
 "hlt_hadron": [["/path/to/run/hlt_hadron/raw", [8, 1977]], ...]
}

where the key is the same as the one used by the settings variable’s input_data_names in the prompt calibration script you are running.

The values are lists of directory paths and the corresponding (Experiment, Run).

options

--queue

The batch queue to use. e.g. short

--global-job-limit

The number of batch jobs that can be active for the user before the backend class will stop submitting. This is not a completely hard limit, the actual max reached depends on other submitting processes and the number submitted before re-checking. (default: 1000)

Default: 1000

--submission-check-heartbeat

The time (seconds) between checking if there are fewer batch jobs than the global limit. Generally not needed to change, but it certainly shouldn’t be set lower than 30 seconds. (default: 30)

Default: 30

--log-level

Possible choices: DEBUG, INFO, RESULT, WARNING, ERROR, FATAL

Set the basf2 LogLevel. (default: INFO

Default: “INFO”

--debug-level

Set the DEBUG level value, overrides log-level to be DEBUG.

--heartbeat

Sets the sleep interval (seconds) between attempts to check the readiness of jobs. (default: 60)

Default: 60

--max-files-per-subjob

Sets the number of input files that will be used per subjob.

--max-subjobs

Sets the maximum number of subjobs that will be submitted. Input files will be split as evenly as possible between the subjobs.

--dry-run

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-output-db

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-working-dir

Flags if the tool should delete the CAF working directory (‘calibration_results’) before beginning the processing. This will delete the previous results! Only use if you want a clean start from the beginning again!

--permissive

Flags if the tool can run scripts also in generic paths and testing payloads

HTCondor#

Runs the jobs using the HTCondor backend.

b2caf-prompt-run HTCondor [-h] [--getenv] [--universe] [--path-prefix]
                          [--global-job-limit] [--submission-check-heartbeat]
                          [--log-level] [--debug-level] [--heartbeat]
                          [--max-files-per-subjob  | --max-subjobs ]
                          [--dry-run] [--overwrite-output-db]
                          [--overwrite-working-dir] [--permissive]
                          caf_config input_data

Required Arguments

caf_config

Path to config JSON file, used to set up the expert configuration, CAF and calibrations used in this CAF process.

Required format of JSON file:

{
 "caf_script": (str),
 "database_chain" array[str],
 "backend_args" dict,     <- Optional as the backend default values and/or values in the prompt script will be used otherwise
 "requested_iov" array[4](int),
 "expert_config" dict,     <- Optional as the prompt script default will be used if this isn't set here.
 "testing_payloads" str    <- Optional, by default do not add testing payloads, possible only with --permissive
}

The backend_args dictionary will be used to set up the caf.backends.Backend class and overrides the defaults of that class. If you set backend options via the command line e.g. --queue l, then this will override backend_args values in this JSON file. Individual caf.framework.Collection objects can also override these options by setting them. So the final priority order is (lowest -> highest): [Backend.default_backend_args -> caf_config.json -> b2caf-prompt-run command line options -> Collection.backend_args]

Generally it is best not to set anything in the prompt script itself. Just use the caf_config.json and b2caf-prompt-run options.

input_data

Path to input data json file, used to find input files for the CAF job. Also to create the IoV for each run so the calibrations can use it if necessary.

Note that the input data json file should have the form:

{
 "hlt_mumu": [["/path/to/run/hlt_mumu_2trk/raw", [8, 1977]], ... ]
 "hlt_hadron": [["/path/to/run/hlt_hadron/raw", [8, 1977]], ...]
}

where the key is the same as the one used by the settings variable’s input_data_names in the prompt calibration script you are running.

The values are lists of directory paths and the corresponding (Experiment, Run).

options

--getenv

Should jobs inherit the submitting environment (doesn’t always work as expected). e.g. false

--universe

Jobs should be submitted using this univese. e.g. vanilla

--path-prefix

The string that should be pre-appended to file path given to backend e.g. root://dcbldoor.sdcc.bnl.gov:1096

Default: “”

--global-job-limit

The number of batch jobs that can be active for the user before the backend class will stop submitting. This is not a completely hard limit, the actual max reached depends on other submitting processes and the number submitted before re-checking. (default: 1000)

Default: 1000

--submission-check-heartbeat

The time (seconds) between checking if there are fewer batch jobs than the global limit. Generally not needed to change, but it certainly shouldn’t be set lower than 30 seconds. (default: 30)

Default: 30

--log-level

Possible choices: DEBUG, INFO, RESULT, WARNING, ERROR, FATAL

Set the basf2 LogLevel. (default: INFO

Default: “INFO”

--debug-level

Set the DEBUG level value, overrides log-level to be DEBUG.

--heartbeat

Sets the sleep interval (seconds) between attempts to check the readiness of jobs. (default: 60)

Default: 60

--max-files-per-subjob

Sets the number of input files that will be used per subjob.

--max-subjobs

Sets the maximum number of subjobs that will be submitted. Input files will be split as evenly as possible between the subjobs.

--dry-run

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-output-db

Flags if the CAF process should be set up but not run. Good for testing if your prompt script and config files are well formed without attempting to submit any jobs.

--overwrite-working-dir

Flags if the tool should delete the CAF working directory (‘calibration_results’) before beginning the processing. This will delete the previous results! Only use if you want a clean start from the beginning again!

--permissive

Flags if the tool can run scripts also in generic paths and testing payloads

Type b2caf-prompt-run <backend> –help to see the full options for each backend

b2caf-prompt-check#

Checks the current scripts in calibration/scripts/prompt/calibrations for problems.

usage: b2caf-prompt-check [-h]

Utility Functions#

This module contains various utility functions for the prompt calibration CAF scripts to use.

prompt.utils.events_in_basf2_file(file_path)[source]#

Does a quick open and return of the number of entries in a basf2 file’s tree object.

Parameters:

file_path (str) – File path to ROOT file

Returns:

Number of entries in tree.

Return type:

int

prompt.utils.filter_by_max_events_per_run(files_to_iov, max_events_per_run, random_select=False, max_events_per_file=0)[source]#

This function creates a new files_to_iov dictionary by appending files in order until the maximum number of events are reached per run. Each file contributes a maximum of events specified by “max_events_per_file”.

Parameters:
  • files_to_iov (dict) – {“/path/to/file.root”: IoV(1,1,1,1)} type dictionary. Same style as used by the CAF for lookup values.

  • max_events_per_run (int) – The threshold we want to reach but stop adding files if we reach it.

  • random_select (bool) – true will select random nfile and false will take first nfile.

  • max_events_per_file (int) – true will limit the contribution from each file to max events specified.

Returns:

The same style of dict as the input files_to_iov, but filtered down.

Return type:

dict

prompt.utils.filter_by_max_files_per_run(files_to_iov, max_files_per_run=1, min_events_per_file=0, random_select=False)[source]#

This function creates a new files_to_iov dictionary by adding files until the maximum number of files per run is reached. After this no more files are added.

It makes the assumption that the IoV is a single run, and that the exp_low and run_low of the IoV object can be used to create the ExpRun fr comparison of whether to add a new input file.

Parameters:
  • files_to_iov (dict) –

    The standard dictionary you might as input to a Calibration. It is of the form

    >>> files_to_iov = {"file_path.root": IoV(1,1,1,1),}
    

  • max_files_per_run (int) – The maximum number of files that we will add to the output dictionary for each run in the input dictionary.

  • min_events_per_file (int) – The minimum number of events that is allowed to be in any included file’s tree.

  • random_select (bool) – true will select random nfile and false will take first nfile.

Returns:

The same style of dict as the input file_to_iov, but filtered down.

Return type:

dict

prompt.utils.filter_by_select_max_events_from_files(input_file_list, select_max_events_from_files)[source]#

This function creates a new list by appending random files until the maximum number of events are reached per data set.

Parameters:
  • input_file_list (list) – [“/path/to/file2.root”, “/path/to/file2.root”]

  • select_max_events_from_files (int) – The threshold we want to reach but stop adding files if we reach it.

Returns:

The sorted list of random files or empty list of not enough found

Return type:

list

prompt.utils.group_files_by_iov(files_to_iov)[source]#

Inverts the files_to_iov dictionary to give back a dictionary of IoV -> File list

Parameters:

files_to_iov (dict) – {“/path/to/file1.root”: IoV(1,1,1,1), “/path/to/file2.root”: IoV(1,1,1,1)}

Returns:

{IoV(1,1,1,1): [“/path/to/file1.root”, “/path/to/file2.root”]}

Return type:

dict