Tools for file handling
Contents
4.2. Tools for file handling#
4.2.1. b2file-metadata-show
: Show the metadata of a basf2 output file#
This tool shows the recorded metadata of a basf2 output file like number of events, lowest event number and so forth. It can either work on a root file or look for the file in a local xml file catalog using an logical file name (LFN):
usage: b2file-metadata-show [-h] [-a] [-s] [--json] (FILENAME|-l LFN)
Optional Arguments
- -h, --help
print all available options
- -l LFN, --lfn LFN
logical file name
- -a, --all
print all information
- --json
print machine-readable information in JSON format. Implies
--all
and--steering
.- -s, --steering
print steering file contents
4.2.2. b2file-metadata-add
: Add/Edit LFN in given file#
This tools allows to modify the LFN and the data descriptions stored in a given basf2 output file. It will also update the xml file catalog if the file was registered in it before.
Changed in version after: release-03-00-00
Previously the file was always registered in the file catalog so even new file catalog was always created if none was existing. Now it only updates the file catalog when the file is already registered.
The keys and values for the data descriptions can take any value and are not used by the offline software in any way. They can be used for bookkeeping or additional information not otherwise included in the metadata.
At the moment the only commonly used keys are
- dataLevel
is automatically set when using
mdst.add_mdst_output
,modularAnalysis.outputMdst
,udst.add_udst_output
, orudst.add_skimmed_udst_output
functions and will be set to either “mdst” or “udst”.- skimDecayMode
is automatically set when using
udst.add_skimmed_udst_output
and will contain the name of the skim.
usage: b2file-metadata-add [-h] [-l LFN] [-d KEY=VALUE] FILENAME
Optional Arguments
- -h, --help
print all available options
- --file arg
file name
- -l LFN, --lfn LFN
logical file name
- -d KEYVALUE, --description KEYVALUE
data description to set of the form key=value. If the argument does not contain an equal sign it’s interpreted as a key to delete from the dataDescriptions.
4.2.3. b2file-catalog-add
: Add a file to a local XML file catalog#
This simple tool adds a file to a local XML file catalog so that it can be found as a parent file independent of the local name or folder structure:
usage: b2file-catalog-add [--help] FILENAME
4.2.4. b2file-check
: Check a basf2 output file for consistency#
Check a given basf2 root file for problems. This program checks a given output root file for the following problems
File is readable (
file_readable
)File contains event and persistent trees (
contains_tree
,contains_persistent
)Event tree contains event meta data (
eventmetadata_readable
)Persistent tree contains file metadata (
filemetadata_readable
)All entries in persistent and event trees are readable (
all_entries_readable
)Number of entries in event tree == number of events in file metadata (
entries_eq_filemetadata
)Number of entries in event tree == expected_no_of_events, if given (
entries_eq_expectation
)Number of MC events generated == expected_no_of_mcevents, if given (
mcevents_eq_expectation
)- If the expected size is given:
abs(total_file_size/(#events*expected_event_size) - 1) < relative_uncertainty (
eventsize_eq_expectation
)
If branches are given: Event tree contains the given branches (
branches_present
)
It returns 0 if all checks succeeded, 1 otherwise. Details about check failures are printed on standard output.
usage: check_basf2_file [-h] [-n EXPECTED_NO_OF_EVENTS]
[--mcevents EXPECTED_NO_OF_MCEVENTS]
[-s EXPECTED_EVENT_SIZE_KIB RELATIVE_UNCERTAINTY]
[--json]
FILE [BRANCH [BRANCH ...]]
Required Arguments
- FILE
The basf2 .root file to check. http:// and root:// URLs are also supported.
- BRANCH
Branches that should exist in the event durability tree.
Optional Arguments
- -n, --events
Expected number of events
- --mcevents
Expected number of generated events
- -s, --size
Expected size per event (KiB), with maximal relative uncertainty. Check is passed when abs(total_file_size/(#events*expected_event_size) - 1) < relative_uncertainty
- --json
Provide dictionary of passed checks and file statistics in JSON format on standard output. Checks are only included when actually run, and receive a boolean value indicating success.
Examples
The --json
parameter can be used to get detailed output for all tests in a machine readable format:
$ check_basf2_file --json framework/tests/root_input.root
{
"checks_passed": {
"all_entries_readable": true,
"contains_persistent": true,
"contains_tree": true,
"entries_eq_filemetedata": true,
"eventmetadata_readable": true,
"file_readable": true,
"filemetadata_readable": true
},
"stats": {
"compression_algorithm": 0,
"compression_factor": 2.591318368911743,
"compression_level": 1,
"events": 5,
"filesize_kib": 61.8984375,
"mcevents": 0,
"size_per_event_kib": 12.3796875
}
}
Changed in version release-03-00-00: files with zero events can now pass the checks
4.2.5. b2file-merge
: Merge multiple basf2 output files#
This program merges files created by separate basf2
jobs with the RootOutput
module. It’s similar to ROOT hadd
but does correctly update the metadata
in the file and merges the objects in the persistent tree correctly.
This tool is intended to only merge output files from identical jobs which were just split into smaller ones for convenience. As such the following restrictions apply:
The files have to be created with the same release and steering file
The persistent tree is only allowed to contain FileMetaData and objects inheriting from Mergeable and the same list of objects needs to be present in all files.
The event tree needs to contain the same DataStore entries in all files.
Hint
If you want to merge the output of VariablesToNtuple
, please use
hadd
.
See also
For a comparision between hadd
, b2file-merge
and friends,
take a look at
this questions.belle2 thread.
usage: b2file-merge [-h] [-f] [-q] [--no-catalog] [--add-to-catalog] OUTPUTFILENAME
INPUTFILENAME [INPUTFILENAME ...]
Optional Arguments
- -f, --force
overwrite the output file if already present
- -q, --quiet
if given only warnings and errors are printed
- --no-catalog
don’t register output file in file catalog. This is now the default and just kept for backwards compatibility.g
- --add-to-catalog
add the output file to the file catalog
Examples
Combine all output files in a given directory:
$ b2file-merge full.root jobs/*.root
Merge all output files but be less verbose, overwrite the output if it exists and don’t register in file catalog:
$ b2file-merge -f -q --no-catalog full.root jobs/*.root
Changed in version release-03-00-00: the tool now checks for consistency of the real/MC flag for all input files and refues to merge mixed sets of real and MC data.
Changed in version after: release-03-00-00
files will by default no longer be registered in a file catalog. To get the
old behavior please supply the --add-to-catalog
command line option or
run b2file-catalaog-add
on the output file.
4.2.6. b2file-mix
: Create a run of mixed data from a set of input files#
This program is inteded to merge a set of basf2 output files and to make them look like one run of mixed data. The events from the different files will be added together by picking a random input file for each event. All events will be added exactly once and the relative order of events from the same input file will be the same but they will be mixed between files.
The EventMetaData of the events will be modified so that the output file will have continously numbered events starting at 1 (or at –start-event=N). The experiment and run number will be set to the same value for all events and can be chosen using the –exp/–run parameter. In contrast, with –keep-eventinfo the original event and run numbers will be kept. This can lead to duplicate event numbers and should only be used if all input files have different event numbers.
The amount of events in the output file can be limited with -n. In this case events are randomly sampled from all files throughout the files but the relative order still remains sequential: events from one file will be in the order they were in that file even if there are gaps.
The random seed can be specified to have a fixed order for the output events. To allow reproducing this process the lfn of all input files is added to the DataDescription of the output file. Together with the random seed which is stored in the output file the mixing procedure can be repeated.
usage: b2file-mix [-h] [--output OUTPUT] [--exp EXP] [--run RUN] [-f]
[--seed SEED] [--start-event START_EVENT]
[--globaltags GLOBALTAGS] [-n OUTPUT_EVENTS]
[--exclude EXCLUDE] [--keep-eventinfo]
INPUTFILE [INPUTFILE ...]
Required Arguments
- INPUTFILE
input filenames containing the events
Optional Arguments
- --output, -o
output file containing the final events
Default: “shuffled.root”
- --exp
the experiment number to set for the output file
Default: 0
- --run
the run number to set for the output file
Default: 0
- -f, --overwrite
overwrite output file if it exists
- --seed
Random seed to use for shuffling, can be used to reproduce the event order from a previous run
- --start-event
Event number of the first event in the final file. Other events will be numbered sequentially
Default: 1
- --globaltags
Globaltags to put in the final file. If not given the globaltags from the first input file will be used.
- -n, --output-events
Limit the events in the output file to be maximum this value
- --exclude
Branch names to exclude from the output file
Default: []
- --keep-eventinfo
Keep the exp,run,event numbers as they are in the original files
4.2.7. b2file-remove-branches
: Create a copy of a basf2 output file removing a list of given branches in the process#
This program can be used to create a copy of a basf2 root file with a list of branches removed from the event tree. This can be useful to remove objects which have been added with a different version of the software and can no longer be read correctly.
- Warning:
If you remove a branch with relations please make sure to also remove all relations to and from this branch to not have dangling relations in the final file.
usage: b2file-remove-branches [-h] -o OUTPUT -i INPUT branch [branch ...]
Required Arguments
- branch
Branch name to omit from the output fileif present in the input file
Optional Arguments
- -o, --output
Name of the output file. If the file already exists it will be overwritten
- -i, --input
Name of the input file.
New in version release-04-00-00.
4.2.8. b2file-size
: Show detailed size information about the content of a file#
This tool allows to print and create bar graphs of the disk space requirements for basf2 events stored in root files.
In it’s most simple form, b2file-size FILENAME
, it will just print the sizes
of each top level branch in the file to the command line. It can however create
a pdf showing detailed plots of the branch and member sizes using the -o
option to specify an output pdf.
It can also compare the sizes in multiple files. In this case additional files
can be specified using -f
.
usage: b2file-size [-h] [-o FILENAME] [--width WIDTH] [--height HEIGHT]
[--legend LEGEND] [--bar-gap FRACTION] [--show-fraction]
[--show-reduction] [--skip-total] [--show-members]
[--skip-relations] [--skip-mcrelations] [-m MEMBER_NAME]
[-s OBJECT_NAME] [-f FILENAME [LABEL ...]]
FILENAME [LABEL]
Required Arguments
- FILENAME
filename to be used to show the size. If the filename ends in ‘+’, the uncompressed data size will be used instead of the compressed.
- LABEL
Label for the filename in the plot. If none is given the filename is used.
Optional Arguments
- -o
the output PDF filename for the generated graphs. If none is given no pdf will be created.
- -f
filename and optional label for additional file to be shown in the plot. If no label is given, filename itself will be used. If the filename ends in ‘+’, the uncompressed data size will be used instead of the compressed.
Visual Options
Options influencing the visual style of the created pdf
- --width
page width in inches, (default: 10.0)
- --height
page height in inches, (default: 7.0)
- --legend
location for the legend. Could either be ‘best’ for automatic placement or (upper, lower) left, center or right e.g. ‘upper left’ or ‘center’, (default: “best”)
- --bar-gap
- fraction of empty space between bars,
(default: 0.2)
Graph Options
Options influencing what objects to show and what graphs to create
- --show-fraction
create an additional chart showing the fraction of the total event size for each object
- --show-reduction
create an additional plot showing the size reduction compared to the first file in the list
- --skip-total
If present, the bar showing the total size will be omitted from the charts
- --show-members
if given also plots for the size of the members of top level objects will be created
- --skip-relations
if given, relations will be omitted from output
- --skip-mcrelations
if given, relations from or to MCParticles will be omitted from output
- -m, --skip-member
member to be skipped when displaying member size (e.g. fBits, fUniqueID), can be supplied more than once
- -s, --skip-object
object to be skipped (e.g. EventMetaData), can be supplied more than once. If an object is omitted then it will not be included in the total size calculation
Examples
create eventsize.pdf with a bar chart showing the kb/event required for each top level object in the file filename.root:
b2file-size filename.root
in addition, create a chart for each top level object showing the disk space required per event by each member. Omit the EventMetaData object and all relations from the output:
b2file-size --show-member -s EventMetaData --skip-relations filename.root
show the top level objects for three files and give them nice labels. In addition, create a chart which shows the reduction factor compared to the first file and save to ‘compare.pdf’:
b2file-size -o compare.pdf --show-reduction file0.root "before update" \
-f file1.root "after update" -f file2.root "after second update"
show gains by using file compression. Notice the + after the filename to indicate that uncompressed size should be shown. Put the legend always in the upper left corner:
b2file-size -o compare.pdf --legend='upper left' --show-reduction \
file.root+ "uncompressed" -f file.root "compressed"
if you want the charts as one page per pdf file I recommend the program
pdftk
to split the pdf after creation:
pdftk eventsize.pdf burst output eventsize-%02d.pdf
4.2.9. b2file-normalize
: Reset non-reproducible root file metadata and optionally the file name in the metadata#
Tool to reset the non-reproducible root file metadata: UUID and datimes. It can also reset the initial file name stored in the file itself, but (WARNING!) this can corrupt the root file.
usage: b2file-normalize [-h] [--output OUTPUT | --in-place] [--name NAME]
[--root-version ROOT_VERSION]
filename
Required Arguments
- filename
Name of the input root file
Optional Arguments
- --output, -o
Name of the output root file, default is basename_normalized.ext
- --in-place, -i
Overwrite the input file
- --name, -n
The file name to be stored in the file, default is to not change it
- --root-version, -r
The root version number to be set, default is to not change it
New in version release-04-00-00.
4.2.10. b2file-md5sum
: Calculate a md5 checksum of a root file content excluding the root metadata#
Tool to calculate a md5 checksum of a root file content excluding the metadata.
usage: b2file-md5sum filename
Required Arguments
- filename
Name of the root file
Optional Arguments
- --ignore-names
Exclude object names and titles from md5 sum
New in version release-04-00-00.