3.2. Tools for file handling

3.2.1. b2file-metadata-show: Show the metadata of a basf2 output file

This tool shows the recorded metadata of a basf2 output file like number of events, lowest event number and so forth. It can either work on a root file or look for the file in a local xml file catalog using an logical file name (LFN):

usage: b2file-metadata-show [-h] [-a] [-s] [--json] (FILENAME|-l LFN)

Optional Arguments

-h, --help

print all available options

-l LFN, --lfn LFN

logical file name

-a, --all

print all information

--json

print machine-readable information in JSON format. Implies --all and --steering.

-s, --steering

print steering file contents

3.2.2. b2file-metadata-add: Add/Edit LFN in given file

This tools allows to modify the LFN and the data descriptions stored in a given basf2 output file. It will also update the xml file catalog if the file was registered in it before.

Changed in version after: release-03-00-00

Previously the file was always registered in the file catalog so even new file catalog was always created if none was existing. Now it only updates the file catalog when the file is already registered.

The keys and values for the data descriptions can take any value and are not used by the offline software in any way. They can be used for bookkeeping or additional information not otherwise included in the metadata.

At the moment the only commonly used keys are

dataLevel

is automatically set when using mdst.add_mdst_output, modularAnalysis.outputMdst, udst.add_udst_output, or udst.add_skimmed_udst_output functions and will be set to either “mdst” or “udst”.

skimDecayMode

is automatically set when using udst.add_skimmed_udst_output and will contain the name of the skim.

usage: b2file-metadata-add [-h] [-l LFN] [-d KEY=VALUE] FILENAME

Optional Arguments

-h, --help

print all available options

--file arg

file name

-l LFN, --lfn LFN

logical file name

-d KEYVALUE, --description KEYVALUE

data description to set of the form key=value. If the argument does not contain an equal sign it’s interpreted as a key to delete from the dataDescriptions.

3.2.3. b2file-catalog-add: Add a file to a local XML file catalog

This simple tool adds a file to a local XML file catalog so that it can be found as a parent file independent of the local name or folder structure:

usage: b2file-catalog-add [--help] FILENAME

3.2.4. b2file-check: Check a basf2 output file for consistency

Check a given basf2 root file for problems. This program checks a given output root file for the following problems

  • File is readable (file_readable)

  • File contains event and persistent trees (contains_tree, contains_persistent)

  • Event tree contains event meta data (eventmetadata_readable)

  • Persistent tree contains file metadata (filemetadata_readable)

  • All entries in persistent and event trees are readable (all_entries_readable)

  • Number of entries in event tree == number of events in file metadata (entries_eq_filemetadata)

  • Number of entries in event tree == expected_no_of_events, if given (entries_eq_expectation)

  • Number of MC events generated == expected_no_of_mcevents, if given (mcevents_eq_expectation)

  • If the expected size is given:

    abs(total_file_size/(#events*expected_event_size) - 1) < relative_uncertainty (eventsize_eq_expectation)

  • If branches are given: Event tree contains the given branches (branches_present)

It returns 0 if all checks succeeded, 1 otherwise. Details about check failures are printed on standard output.

usage: check_basf2_file [-h] [-n EXPECTED_NO_OF_EVENTS]
                        [--mcevents EXPECTED_NO_OF_MCEVENTS]
                        [-s EXPECTED_EVENT_SIZE_KIB RELATIVE_UNCERTAINTY]
                        [--json]
                        FILE [BRANCH [BRANCH ...]]

Required Arguments

FILE

The basf2 .root file to check. http:// and root:// URLs are also supported.

BRANCH

Branches that should exist in the event durability tree.

Optional Arguments

-n, --events

Expected number of events

--mcevents

Expected number of generated events

-s, --size

Expected size per event (KiB), with maximal relative uncertainty. Check is passed when abs(total_file_size/(#events*expected_event_size) - 1) < relative_uncertainty

--json

Provide dictionary of passed checks and file statistics in JSON format on standard output. Checks are only included when actually run, and receive a boolean value indicating success.

Examples

The --json parameter can be used to get detailed output for all tests in a machine readable format:

$ check_basf2_file --json framework/tests/root_input.root
{
  "checks_passed": {
    "all_entries_readable": true,
    "contains_persistent": true,
    "contains_tree": true,
    "entries_eq_filemetedata": true,
    "eventmetadata_readable": true,
    "file_readable": true,
    "filemetadata_readable": true
  },
  "stats": {
    "compression_algorithm": 0,
    "compression_factor": 2.591318368911743,
    "compression_level": 1,
    "events": 5,
    "filesize_kib": 61.8984375,
    "mcevents": 0,
    "size_per_event_kib": 12.3796875
  }
}

Changed in version release-03-00-00: files with zero events can now pass the checks

3.2.5. b2file-merge: Merge multiple basf2 output files

This program merges files created by separate basf2 jobs with the RootOutput module. It’s similar to ROOT hadd but does correctly update the metadata in the file and merges the objects in the persistent tree correctly.

This tool is intended to only merge output files from identical jobs which were just split into smaller ones for convenience. As such the following restrictions apply:

  • The files have to be created with the same release and steering file

  • The persistent tree is only allowed to contain FileMetaData and objects inheriting from Mergeable and the same list of objects needs to be present in all files.

  • The event tree needs to contain the same DataStore entries in all files.

Hint

If you want to merge the output of VariablesToNtuple, please use hadd.

See also

For a comparision between hadd, b2file-merge and friends, take a look at this questions.belle2 thread.

usage: b2file-merge [-h] [-f] [-q] [--no-catalog] [--add-to-catalog] OUTPUTFILENAME
                    INPUTFILENAME [INPUTFILENAME ...]

Optional Arguments

-f, --force

overwrite the output file if already present

-q, --quiet

if given only warnings and errors are printed

--no-catalog

don’t register output file in file catalog. This is now the default and just kept for backwards compatibility.g

--add-to-catalog

add the output file to the file catalog

Examples

  • Combine all output files in a given directory:

    $ b2file-merge full.root jobs/*.root
    
  • Merge all output files but be less verbose, overwrite the output if it exists and don’t register in file catalog:

    $ b2file-merge -f -q --no-catalog full.root jobs/*.root
    

Changed in version release-03-00-00: the tool now checks for consistency of the real/MC flag for all input files and refues to merge mixed sets of real and MC data.

Changed in version after: release-03-00-00 files will by default no longer be registered in a file catalog. To get the old behavior please supply the --add-to-catalog command line option or run b2file-catalaog-add on the output file.

3.2.6. b2file-mix: Create a run of mixed data from a set of input files

This program is inteded to merge a set of basf2 output files and to make them look like one run of mixed data. The events from the different files will be added together by picking a random input file for each event. All events will be added exactly once and the relative order of events from the same input file will be the same but they will be mixed between files.

The EventMetaData of the events will be modified so that the output file will have continously numbered events starting at 1 (or at –start-event=N). The experiment and run number will be set to the same value for all events and can be chosen using the –exp/–run parameter. In contrast, with –keep-eventinfo the original event and run numbers will be kept. This can lead to duplicate event numbers and should only be used if all input files have different event numbers.

The amount of events in the output file can be limited with -n. In this case events are randomly sampled from all files throughout the files but the relative order still remains sequential: events from one file will be in the order they were in that file even if there are gaps.

The random seed can be specified to have a fixed order for the output events. To allow reproducing this process the lfn of all input files is added to the DataDescription of the output file. Together with the random seed which is stored in the output file the mixing procedure can be repeated.

usage: b2file-mix [-h] [--output OUTPUT] [--exp EXP] [--run RUN] [-f]
                  [--seed SEED] [--start-event START_EVENT]
                  [--globaltags GLOBALTAGS] [-n OUTPUT_EVENTS]
                  [--exclude EXCLUDE] [--keep-eventinfo]
                  INPUTFILE [INPUTFILE ...]

Required Arguments

INPUTFILE

input filenames containing the events

Optional Arguments

--output, -o

output file containing the final events

Default: “shuffled.root”

--exp

the experiment number to set for the output file

Default: 0

--run

the run number to set for the output file

Default: 0

-f, --overwrite

overwrite output file if it exists

--seed

Random seed to use for shuffling, can be used to reproduce the event order from a previous run

--start-event

Event number of the first event in the final file. Other events will be numbered sequentially

Default: 1

--globaltags

Globaltags to put in the final file. If not given the globaltags from the first input file will be used.

-n, --output-events

Limit the events in the output file to be maximum this value

--exclude

Branch names to exclude from the output file

Default: []

--keep-eventinfo

Keep the exp,run,event numbers as they are in the original files

3.2.7. b2file-remove-branches: Create a copy of a basf2 output file removing a list of given branches in the process

This program can be used to create a copy of a basf2 root file with a list of branches removed from the event tree. This can be useful to remove objects which have been added with a different version of the software and can no longer be read correctly.

Warning:

If you remove a branch with relations please make sure to also remove all relations to and from this branch to not have dangling relations in the final file.

usage: b2file-remove-branches [-h] -o OUTPUT -i INPUT branch [branch ...]

Required Arguments

branch

Branch name to omit from the output fileif present in the input file

Optional Arguments

-o, --output

Name of the output file. If the file already exists it will be overwritten

-i, --input

Name of the input file.

New in version release-04-00-00.

3.2.8. b2file-size: Show detailed size information about the content of a file

This tool allows to print and create bar graphs of the disk space requirements for basf2 events stored in root files.

In it’s most simple form, b2file-size FILENAME, it will just print the sizes of each top level branch in the file to the command line. It can however create a pdf showing detailed plots of the branch and member sizes using the -o option to specify an output pdf.

It can also compare the sizes in multiple files. In this case additional files can be specified using -f.

usage: b2file-size [-h] [-o FILENAME] [--width WIDTH] [--height HEIGHT]
                   [--legend LEGEND] [--bar-gap FRACTION] [--show-fraction]
                   [--show-reduction] [--skip-total] [--show-members]
                   [--skip-relations] [--skip-mcrelations] [-m MEMBER_NAME]
                   [-s OBJECT_NAME] [-f FILENAME [LABEL ...]]
                   FILENAME [LABEL]

Required Arguments

FILENAME

filename to be used to show the size. If the filename ends in ‘+’, the uncompressed data size will be used instead of the compressed.

LABEL

Label for the filename in the plot. If none is given the filename is used.

Optional Arguments

-o

the output PDF filename for the generated graphs. If none is given no pdf will be created.

-f

filename and optional label for additional file to be shown in the plot. If no label is given, filename itself will be used. If the filename ends in ‘+’, the uncompressed data size will be used instead of the compressed.

Visual Options

Options influencing the visual style of the created pdf

--width

page width in inches, (default: 10.0)

--height

page height in inches, (default: 7.0)

--legend

location for the legend. Could either be ‘best’ for automatic placement or (upper, lower) left, center or right e.g. ‘upper left’ or ‘center’, (default: “best”)

--bar-gap
fraction of empty space between bars,

(default: 0.2)

Graph Options

Options influencing what objects to show and what graphs to create

--show-fraction

create an additional chart showing the fraction of the total event size for each object

--show-reduction

create an additional plot showing the size reduction compared to the first file in the list

--skip-total

If present, the bar showing the total size will be omitted from the charts

--show-members

if given also plots for the size of the members of top level objects will be created

--skip-relations

if given, relations will be omitted from output

--skip-mcrelations

if given, relations from or to MCParticles will be omitted from output

-m, --skip-member

member to be skipped when displaying member size (e.g. fBits, fUniqueID), can be supplied more than once

-s, --skip-object

object to be skipped (e.g. EventMetaData), can be supplied more than once. If an object is omitted then it will not be included in the total size calculation

Examples

create eventsize.pdf with a bar chart showing the kb/event required for each top level object in the file filename.root:

b2file-size filename.root

in addition, create a chart for each top level object showing the disk space required per event by each member. Omit the EventMetaData object and all relations from the output:

b2file-size --show-member -s EventMetaData --skip-relations filename.root

show the top level objects for three files and give them nice labels. In addition, create a chart which shows the reduction factor compared to the first file and save to ‘compare.pdf’:

b2file-size -o compare.pdf --show-reduction file0.root "before update" \
    -f file1.root "after update" -f file2.root "after second update"

show gains by using file compression. Notice the + after the filename to indicate that uncompressed size should be shown. Put the legend always in the upper left corner:

b2file-size -o compare.pdf --legend='upper left' --show-reduction \
    file.root+ "uncompressed" -f file.root "compressed"

if you want the charts as one page per pdf file I recommend the program pdftk to split the pdf after creation:

pdftk eventsize.pdf burst output eventsize-%02d.pdf

3.2.9. b2file-normalize: Reset non-reproducible root file metadata and optionally the file name in the metadata

Tool to reset the non-reproducible root file metadata: UUID and datimes. It can also reset the initial file name stored in the file itself, but (WARNING!) this can corrupt the root file.

usage: b2file-normalize [-h] [--output OUTPUT | --in-place] [--name NAME]
                        [--root-version ROOT_VERSION]
                        filename

Required Arguments

filename

Name of the input root file

Optional Arguments

--output, -o

Name of the output root file, default is basename_normalized.ext

--in-place, -i

Overwrite the input file

--name, -n

The file name to be stored in the file, default is to not change it

--root-version, -r

The root version number to be set, default is to not change it

New in version release-04-00-00.

3.2.10. b2file-md5sum: Calculate a md5 checksum of a root file content excluding the root metadata

Tool to calculate a md5 checksum of a root file content excluding the metadata.

usage: b2file-md5sum filename

Required Arguments

filename

Name of the root file

Optional Arguments

--ignore-names

Exclude object names and titles from md5 sum

New in version release-04-00-00.