5.4. Conditions Database#

The conditions database is the place where we store additional data needed to interpret and analyse the data that can change over time, for example the detector configuration or calibration constants.

In many cases it should not be necessary to change the configuration but except for maybe adding an extra globaltag to the list via conditions.globaltags

5.4.1. Conditions Database Overview#

The conditions database consists of binary objects (usually ROOT files) which are identified by a name and a revision number. These objects are called payloads and can have one or more intervals of validity (iov) which is the experiment and run range this object is deemed to be valid. A collection of payloads and iovs for a certain dataset is called a globaltag and is identified by a unique name.

Payload

An atom of conditions data identified by name and revision number

  • usually ROOT files containing custom objects

  • a payload cannot be modified after creation.

Interval of Validity (iov)

An experiment and run interval for which a payload should be valid

  • consists of four values: first_exp, first_run, final_exp, final_run

  • it is still valid for the final run (inclusive end)

  • final_exp>=0 && final_run<0: valid for all runs in final_exp

  • final_exp <0 && final_run<0: valid forever, starting with fist_exp, first_run

Globaltag:

A collection of payloads and iovs

  • has a unique name and a description

  • can have gaps and overlaps, a payload is not guaranteed to be present for all possible runs.

  • are not necessarily valid for all data

  • are not valid for all software versions

  • are immutable once “published”

  • can be “invalidated”

../../_images/globaltag.png

Fig. 5.2 Schematic display of a globaltag: Different payloads with different revisions are valid for a certain set of runs.#

Once a globaltag is prepared it is usually immutable and cannot be modified any further to ensure reproducibility of analyses: different processing iterations will use different globaltags. The only exceptions are the globaltag used online during data taking and for prompt processing of data directly after data taking: These two globaltags are flagged as running and information for new runs is constantly added to them as more data becomes available.

All globaltags to be used in analysis need to be fixed. Globaltags which are not fixed, i.e. “OPEN”, cannot be used for data processing. Otherwise it is not possible to know exactly which conditions were used. If a globaltag in the processing chain is in “OPEN” state processing will not be possible.

Changed in version release-04-00-00: Prior to release-04-00-00 no check was performed on the state of globaltags so all globaltags could be used in analysis

When processing data the software needs to know which payloads to use for which runs so the globaltag (or multiple globaltags) to be used needs to be specified. This is done automatically in most cases as the globaltag used to create the file will be used when processing, the so called globaltag replay. However the user can specify additional globaltags to use in addition or to override the globaltag selection completely and disable globaltag replay.

Changed in version release-04-00-00: Prior to release-04 the globaltag needed to always be configured manually and no automatic replay was in place

If multiple globaltags are selected the software will look for all necessary payloads in each of them in turn and always take each payload from the first globaltag it can be found in.

Tip

A set of tools is provided with basf2 for interacting with the conditions database via command line and operating actions such as creating globaltags, uploading payloads, etc. Such tools, called b2conditionsdb tools, are properly documented in b2conditionsdb.

5.4.2. Configuring the Conditions Database#

By default the software will look for conditions data in

  1. user globaltags: Look in all globaltags provided by the user by setting conditions.globaltags

  2. globaltag replay: Look in different base globaltags depending on the use case:

    • Processing existing files (i.e. mdst): the globaltags specified in the input file are used.

    • Generating and events: the default globaltags for the current software version (conditions.default_globaltags are used.

The globaltag replay can be disabled by calling conditions.override_globaltags() If multiple files are processed all need to have the same globaltags specified, otherwise processing cannot continue unless globaltag replay is disabled.

There are more advanced settings available via the conditions object which should not be needed by most users, like setting the URL to the central database or where to look for previously downloaded payloads.

Changed in version release-04-00-00: In release-04 the configuration interface has been changed completely to before. Now all settings are grouped via the conditions object.

basf2.conditions#

Global instance of a basf2.ConditionsConfiguration object containing all he settings relevant for the conditions database

class basf2.ConditionsConfiguration#

This class contains all configurations for the conditions database service

  • which globaltags to use

  • where to look for payload information

  • where to find the actual payload files

  • which temporary testing payloads to use

But for most users the only thing they should need to care about is to set the list of additional globaltags to use.

append_globaltag(name)#

Append a globaltag to the end of the globaltags list. That means it will be the lowest priority of all tags in the list.

append_testing_payloads(filename)#

Append a text file containing local test payloads to the end of the list of testing_payloads. This will mean they will have lower priority than payloads in previously defined text files but still higher priority than globaltags.

Parameters

filename (str) – file containing a local definition of payloads and their intervals of validity for testing

Warning

This functionality is strictly for testing purposes. Using local payloads leads to results which cannot be reproduced by anyone else and thus cannot be published.

property default_globaltags#

A tuple containing the default globaltags to be used if events are generated without an input file.

property default_metadata_provider_url#

URL of the default central metadata provider to look for payloads in the conditions database.

disable_globaltag_replay()#

“disable_globaltag_replay()

Disable global tag replay and revert to the old behavior that the default globaltag will be used if no other globaltags are specified.

This is a shortcut to just calling

>>> conditions.override_globaltags()
>>> conditions.globaltags += list(conditions.default_globaltags)
expert_settings(**kwargs)#

Set some additional settings for the conditions database.

You can supply any combination of keyword-only arguments defined below. The function will return a dictionary containing all current settings.

>>> conditions.expert_settings(connection_timeout=5, max_retries=1)
{'save_payloads': 'localdb/database.txt',
 'download_cache_location': '',
 'download_lock_timeout': 120,
 'usable_globaltag_states': {'PUBLISHED', 'RUNNING', 'TESTING', 'VALIDATED'},
 'connection_timeout': 5,
 'stalled_timeout': 60,
 'max_retries': 1,
 'backoff_factor': 5}

Warning

Modification of these parameters should not be needed, in rare circumstances this could be used to optimize access for many jobs at once but should only be set by experts.

Parameters
  • save_payloads (str) – Where to store new payloads created during processing. This should be a filename to contain the payload information and the payload files will be placed in the same directory as the file.

  • download_cache_location (str) – Where to store payloads which have been downloaded from the central server. This could be a user defined directory, otherwise empty string defaults to $TMPDIR/basf2-conditions where $TMPDIR is the temporary directories defined in the system. Newly downloaded payloads will be stored in this directory in a hashed structure, see payload_locations

  • download_lock_timeout (int) – How many seconds to wait for a write lock when concurrently downloading the same payload between different processes. If locking fails the payload will be downloaded to a temporary file separately for each process.

  • usable_globaltag_states (set(str)) – Names of globaltag states accepted for processing. This can be changed to make sure that only fully published globaltags are used or to enable running on an open tag. It is not possible to allow usage of ‘INVALID’ tags, those will always be rejected.

  • connection_timeout (int) – timeout in seconds before connection should be aborted. 0 sets the timeout to the default (300s)

  • stalled_timeout (int) – timeout in seconds before a download should be aborted if the speed stays below 10 KB/s, 0 disables this timeout

  • max_retries (int) – maximum amount of retries if the server responded with an HTTP response of 500 or more. 0 disables retrying

  • backoff_factor (int) – backoff factor for retries in seconds. Retries are performed using something similar to binary backoff: For retry \(n\) and a backoff_factor \(f\) we wait for a random time chosen uniformely from the interval \([1, (2^{n} - 1) \times f]\) in seconds.

property globaltags#

List of globaltags to be used. These globaltags will be the ones with highest priority but by default the globaltags used to create the input files or the default globaltag will also be used.

The priority of the globaltags in this list is highest first. So the first in the list will be checked first and all other globaltags will only be checked for payloads not found so far.

Warning

By default this list contains the globaltags to be used in addition to the ones from the input file or the default one if no input file is present. If this is not desirable you need to call override_globaltags() to disable any addition or modification of this list.

property metadata_providers#

List of metadata providers to use when looking for payload metadata. There are currently two supported providers:

  1. Central metadata provider to look for payloads in the central conditions database. This provider is used for any entry in this list which starts with http(s)://. The URL should point to the top level of the REST api endpoints on the server

  2. Local metadata provider to look for payloads in a local SQLite snapshot taken from the central server. This provider will be assumed for any entry in this list not starting with a protocol specifier or if the protocol is given as file://

This list should rarely need to be changed. The only exception is for users who want to be able to use the software without internet connection after they downloaded a snapshot of the necessary globaltags with b2conditionsdb download to point to this location.

property override_enabled#

Indicator whether or not the override of globaltags is enabled. If true then globaltags present in input files will be ignored and only the ones given in globaltags will be considered.

override_globaltags(list=None)#

Enable globaltag override. This disables all modification of the globaltag list at the beginning of processing:

  • the default globaltag or the input file globaltags will be ignored.

  • any callback set with set_globaltag_callback will be ignored.

  • the list of globaltags will be used exactly as it is.

Parameters

list (list(str) or None) –

Warning

it’s still possible to modify globaltags after this call.

property payload_locations#

List of payload locations to search for payloads which have been found by any of the configured metadata_providers. This can be a local directory or a http(s):// url pointing to the payload directory on a server.

For remote locations starting with http(s):// we assume that the layout of the payloads on the server is the same as on the main payload server: The combination of given location and the relative url in the payload metadata field payloadUrl should point to the correct payload on the server.

For local directories, two layouts are supported and will be auto detected:

flat

All payloads are in the same directory without any substructure with the name dbstore_{name}_rev_{revision}.root

hashed

All payloads are stored in subdirectories in the form AB/{name}_r{revision}.root where A and B are the first two characters of the md5 checksum of the payload file.

Example

Given payload_locations = ["payload_dir/", "http://server.com/payloads"] the framework would look for a payload with name BeamParameters in revision 45 (and checksum a34ce5...) in the following places

  1. payload_dir/a3/BeamParameters_r45.root

  2. payload_dir/dbstore_BeamParameters_rev_45.root

  3. http://server.com/payloads/dbstore/BeamParameters/dbstore_BeamParameters_rev_45.root given the usual pattern of the payloadUrl metadata. But this could be changed on the central servers so mirrors should not depend on this convention but copy the actual structure of the central server.

If the payload cannot be found in any of the given locations the framework will always attempt to download it directly from the central server and put it in a local cache directory.

prepend_globaltag(name)#

Add a globaltag to the beginning of the globaltags list. That means it will be the highest priority of all tags in the list.

prepend_testing_payloads(filename)#

Insert a text file containing local test payloads in the beginning of the list of testing_payloads. This will mean they will have higher priority than payloads in previously defined text files as well as higher priority than globaltags.

Parameters

filename (str) – file containing a local definition of payloads and their intervals of validity for testing

Warning

This functionality is strictly for testing purposes. Using local payloads leads to results which cannot be reproduced by anyone else and thus cannot be published.

reset()#

Reset the conditions database configuration to its original state.

set_globaltag_callback(function)#

Set a callback function to be called just before processing.

This callback can be used to further customize the globaltags to be used during processing. It will be called after the input files have been opened and checked with three keyword arguments:

base_tags

The globaltags determined from either the input files or, if no input files are present, the default globaltags

user_tags

The globaltags provided by the user

metadata

If there are not input files (e.g. generating events) this argument is None. Otherwise it is a list of all the FileMetaData instances from all input files. This list can be empty if there is no metadata associated with the input files.

From this information the callback function should then compose the final list of globaltags to be used for processing and return this list. If None is returned the default behavior is applied as if there were no callback function. If anything else is returned the processing is aborted.

If no callback function is specified the default behavior is equivalent to

def callback(base_tags, user_tags, metadata):
  if not base_tags:
    basf2.B2FATAL("No baseline globaltags available. Please use override")

  return user_tags + base_tags

If override_enabled is True then the callback function will not be called.

Warning

If a callback is set it is responsible to select the correct list of globaltags and also make sure that all files are compatible. No further checks will be done by the framework but any list of globaltags which is returned will be used exactly as it is.

If the list of base_tags is empty that usually means that the input files had different globaltag settings but it is the responsibility of the callback to then verify if the list of globaltags is usable or not.

If the callback function determines that no working set of globaltags can be determined then it should abort processing using a FATAL error or an exception

property testing_payloads#

List of text files to look for local testing payloads. Each entry should be a text file containing local payloads and their intervals of validity to be used for testing.

Payloads found in these files and valid for the current run will have a higher priority than any of the globaltags. If a valid payload is present in multiple files the first one in the list will have higher priority.

Warning

This functionality is strictly for testing purposes. Using local payloads leads to results which cannot be reproduced by anyone else and thus cannot be published.

Configuration in C++#

You can also configure all of these settings in C++ when writing standalone tools using the conditions data. This works very similar to the configuration in python as all the settings are exposed by a single class

#include <framework/database/Configuration.h>

auto& config = Conditions::Configuration::getInstance();
config.prependGlobltag("some_tag");

Environment Variables#

These settings can be modified by setting environment variables

BELLE2_CONDB_GLOBALTAG#

if set this overrides the default global tag returned by conditions.default_globaltags. This can be a list of tags separated by white space in which case all global tags are searched.

export BELLE2_CONDB_FALLBACK="defaultTag additionalTag"

Changed in version release-04-00-00: This setting doesn’t affect globaltag replay, only the default globaltag used when replay is disabled or events are generated.

Previously if set to an empty value access to the central database was disabled, now it will just set an empty list of default globaltags

BELLE2_CONDB_SERVERLIST#

this environment variable can be set to a list of URLS to look for the conditions database. The urls will at the beginning of conditions.metadata_providers

If BELLE2_CONDB_METADATA is also set it takes precedence over this setting

New in version release-04-00-00.

BELLE2_CONDB_METADATA#

a whitespace separated list of urls and sqlite files to look for payload information. Will be used to populate conditions.metadata_providers

New in version release-04-00-00.

BELLE2_CONDB_PAYLOADS#

a whitespace separated list of directories or urls to look for payload files. Will be used to populate conditions.payload_locations

New in version release-04-00-00.

BELLE2_CONDB_PROXY#

Can be set to specify a proxy server to use for all connections to the central database. The parameter should be a string holding the host name or dotted numerical IP address. A numerical IPv6 address must be written within [brackets]. To specify port number in this string, append :[port] to the end of the host name. If not specified, libcurl will default to using port 1080 for proxies. The proxy string should be prefixed with [scheme]:// to specify which kind of proxy is used.

http

HTTP Proxy. Default when no scheme or proxy type is specified.

https

HTTPS Proxy.

socks4

SOCKS4 Proxy.

socks4a

SOCKS4a Proxy. Proxy resolves URL hostname.

socks5

SOCKS5 Proxy.

socks5h

SOCKS5 Proxy. Proxy resolves URL hostname.

export BELLE2_CONDB_PROXY="http://192.168.178.1:8081"

If it is not set the default proxy configuration is used (e.g. honor $http_proxy). If it is set to an empty value direct connection is used.

5.4.3. Offline Mode#

Sometimes you might need to run the software without internet access, for example when trying to develop on a plane or somewhere else where internet is spotty and unreliable.

To do that you need to download all the information on the payloads and the payload files themselves before running the software:

  1. Find out which globaltags you need. If you just want to run generation/simulation you only might need the default globaltag. To see which one that is you can run

    basf2 --info
    

    if you want to run over some existing files you might want to check what globaltags are stored in the file information using

    b2file-metadata-show --all inputfile.root
    
  2. Download all the necessary information using the b2conditionsdb command line tool. The tool will automatically download all information in a globaltag as well as the payload files and put it all in a sqlite file and the payload files in the same directory

    b2conditionsdb download -o /path/where/to/download/metadata.sqlite globaltag1 globaltag2 ...
    
  3. Tell the software to use only this downloaded information and not try to contact the central server. This can be done in the steering file with the conditions.metadata_providers and conditions.payload_locations settings but the easiest is to use environment variables

    export BELLE2_CONDB_METADATA=/path/where/to/download/metadata.sqlite
    export BELLE2_CONDB_PAYLOADS=/path/where/to/download
    

5.4.4. Creation of new payloads#

New payloads can be created by calling Belle2::Database::storeData() from either C++ or python. It takes a TObject* or TClonesArray* pointer to the payload data and an interval of validity. Optionally, the first argument can be the name to store the Payload with which usually defaults to the classname of the object.

#include <framework/database/Database.h>
#include <framework/database/IntervalOfValidity.h>
#include <framework/dbobjects/BeamParameters.h>

std::unique_ptr<BeamParameters> beamParams(new BeamParameters());
Database::Instance().storeData(beamParams.get(), IntervalOfValidity::always());

In python the name of the payload cannot be inferred so it always needs to be specified explicitly

from ROOT import Belle2
beam_params = Belle2.BeamParameters()
iov = Belle2.IntervalOfValidity(0,0,3,-1)
Belle2.Database.Instance().storeData("BeamParameters", beam_params, iov)

By default this will create new payload files in the subdirectory “localdb” relative to the current working directory and also create a text file containing the metadata for the payload files.

These payloads can then be tested by adding the filename of the text file to conditions.testing_payloads and once satisfied can be uploaded with b2conditionsdb-upload or b2conditionsdb-request

Changed in version release-06-00-00.

When new payloads are created via Belle2::Database::storeData the new payloads will be assigned a revision number consisting of the first few characters of the checksum of the payload file. This is done for efficient creation of payload files but also to distinguish locally created payload files from payloads downloaded from the database.

  • If a payload has an alphanumeric string similar to a git commit hash as revision number then it was created locally

  • If a payload has a numeric revision number it was either downloaded from the database or created with older release versions.

In any case the local revision number is not related with the revision number obtained when uploading a payload to the server.