Tip

The functions and tools documented here are intended for skim liaisons and developers. If you are only interested in the selection criteria, then this section is probably not relevant for you.

17.4.1. Writing a skim¶

In the skim package, skims are defined via the BaseSkim class. The skim package is organised around this for the following reasons:

this keeps the package organised, with every skim being defined in a predictable way,
this allows the skims to be located by standard helper tools such as b2skim-run and b2skim-stats-print, and
skims must be combined with other skims to reduce the number of grid job submissions, and the CombinedSkim class is written to combined objects of type BaseSkim.

To write a new skim, please follow these steps:

Start by defining a class which inherits from BaseSkim and give it the name of your skim. Put the class in an appropriate skim module for your working group. For example, the skim DarkSinglePhoton belongs in skim/scripts/skim/dark.py, and begins with the following definition:
```
class DarkSinglePhoton(BaseSkim):
    # docstring here explaining reconstructed decay modes and applied cuts.
```
[Mandatory] Tell us about your skim by setting the following attributes:
- __description__: one-line summary describing the purpose of your skim.
- __category__: a list of category keywords.
- __authors__: list of skim authors.
- __contact__: the name and contact email of the skim liaison responsible for this skim.
BaseSkim requires you to set these attributes in each subclass. Once these are set, we can we add a lovely auto-generated header to the documentation of the skim by using the fancy_skim_header decorator.
```
@fancy_skim_header
class DarkSinglePhoton(BaseSkim):
    # docstring here describing your skim, and explaining cuts.
```
This header will appear as a “Note” block at the top of your skim class on Sphinx, and will also appear at the top of the help function in an interactive Python session:
```
>>> from skim.foo import MySkim
>>> help(MySkim)
```
Tip

If your skim does not define __description__, __category__, __authors__, __contact__, or build_lists, then you will see an error message like:
```
TypeError: Can't instantiate abstract class SinglePhotonDark with abstract methods __authors__
```
This can be fixed by defining these required attributes and methods.
If you require any standard lists to be loaded for your skim, override the method load_standard_lists. This will be run before build_lists and additional_setup.

This step is separated into its own function so that the CombinedSkim class can do special handling of these functions to avoid accidentally loading a standard list twice when combinining skims.
If any further setup is required, then override the additional_setup method.
[Mandatory] Define all cuts by overriding build_lists. Before the end of the build_lists method, the attribute SkimLists must be set to a list of skim list names.
Skims can crash on the grid if the log files are too large. If any modules is producing too much output, then override the attribute NoisyModules as a list of such modules, and their output will be set to print only error-level messages.
By default, the skim test file is a neutral $B$ pair sample with beam background. If your skim has a retention rate of close to zero for this sample type, you may wish to override the attribute TestFiles. This should be a list of file names retrieved from skimExpertFunctions.get_test_file, such as:
```
TestFiles = [get_test_file("MC13_ggBGx1")]
```
[Mandatory] Add your skim to the registry, with an appropriate skim code (see Skim Registry).

With all of these steps followed, you will now be able to run your skim using the skim command line tools. To make sure that you skim does what you expect, and is feasible to put into production, please also complete the following steps:

Test your skim! The primary point of skims is to be run on the grid, so you want to be sure that the retention rate and processing time are low enough to make this process smooth.

The skim package contains a set of tools to make this straightforward for you. See Testing skim performance for more details.
Define validation histograms for your skim by overriding the method validation_histograms. Please see the source code of various skims for examples of how to do this.

17.4.2. Building skim lists in a steering file¶

Calling an instance of a skim class will run the particle list loaders, setup function, list builder function, and uDST output function. So a minimal skim steering file might consist of the following:

import basf2 as b2
import modularAnalysis as ma
from skim.foo import MySkim

path = b2.Path()
ma.inputMdstList("default", [], path=path)
skim = MySkim()
skim(path)  # __call__ method loads standard lists, creates skim lists, and saves to uDST
b2.process(path)

After skim(path) has been called, the skim list names are stored in the Python list skim.SkimLists.

Warning

There is a subtle but important technicality here: if BaseSkim.skim_event_cuts has been called, then the skim lists are not built for all events on the path, but they are built for all events on a conditional path. A side-effect of this is that no post-skim path can be safely defined for the CombinedSkim class (since a combined skim of five skims may have up to five conditional paths).

After a skim has been added to the path, the attribute BaseSkim.postskim_path contains a safe path for adding subsequent modules to (e.g. performing further reconstruction using the skim candidates). However, the final call to basf2.process must be passed the original (main) path.

skim = MySkim()
skim(path)
# Add subsequent modules to skim.postskim_path
ma.variablesToNtuple(skim.SkimLists[0], ["pt", "E"], path=skim.postskim_path)
# Process full path
b2.process(path)

The above code snippet will produce both uDST and ntuple output. To only build the skim lists without writing to uDST, pass the configuration parameter outputUdst=False during initialisation of the skim object:

skim = MySkim(udstOutput=False)
skim(path)

Disabling uDST output may be useful to you if you want to do any of the following:

print the statistics of the skim without producing any output files,
build the skim lists and perform further reconstruction or fitting on the skim candidates before writing the ROOT output,
go directly from unskimmed MDST to analysis ntuples in a single steering file (but please consider first using the centrally-produce skimmed uDSTs), or
use the skim flag to build the skim lists and write an event-level ntuple with information about which events pass the skim.

Tip

The tool b2skim-generate can be used to generate simple skim steering files like the example above. The tool b2skim-run is a standalone tool for running skims. b2skim-run is preferable for quickly testing a skim during skim development. b2skim-generate should be used as a starting point if you are doing anything more complicated than simply running a skim on an MDST file to produce a uDST file.

Skim flags¶

When a skim is added to the path, an entry is added to the event extra info to indicate whether an event passes the skim or not. This flag is of the form eventExtraInfo(passes_<SKIMNAME>) (aliased to passes_<SKIMNAME> for convenience), and the flag name is stored in the property BaseSkim.flag.

In the example below, we build the skim lists, skip the uDST output, and write an ntuple containing the skim flag and other event-level variables:

skim = MySkim(udstOutput=False)
skim(path)
ma.variablesToNtuple("", [skim.flag, "nTracks"], path=path)
b2.process(path)

Skim flags can also be used in combined skims, with the individual flags being available in the list CombinedSkim.flags. In the example below, we run three skims in a combined skim, disable the uDST output, and then save the three skim flags to an ntuple.

skim = CombinedSkim(
    SkimA(),
    SkimB(),
    SkimC(),
    udstOutput=False,
)
skim(path)
ma.variablesToNtuple("", skim.flags + ["nTracks"], path=path)
b2.process(path)

Tip

Skim flags are guaranteed to work on the main path (the variable path in the above examples). However, any other modules attempting to access the skim lists should be added to the postskim_path.

17.4.3. Running a skim¶

In the skim package, there are command-line tools available for running skims, documented below. These take a skim name as a command line argument, and run the code defined in the corresponding subclass of BaseSkim.

`b2skim-run`: Run a skim¶

Tip

This tool completely supplants the <SkimName>_Skim_Standalone.py steering files from previous versions of basf2. The standalone/ and combined/ directories no longer exist in the skim package from version-05-00-00 onwards.

General steering file for running skims.

usage: b2skim-run [-h] {single,combined,module} ...

Required Arguments

action

Possible choices: single, combined, module

Run just one skim, or multiple skims at once.

Sub-commands:¶

single¶

Run a single skim.

b2skim-run single [-h] [-o Output uDST location] [-n MaxInputEvents]
                  [-i InputFileList [InputFileList ...]]
                  Skim

Required Arguments

Skim

Possible choices: Random, SystematicsTracking, Resonance, SystematicsRadMuMu, SystematicsEELL, SystematicsRadEE, SystematicsLambda, SystematicsPhiGamma, SystematicsFourLeptonFromHLTFlag, SystematicsRadMuMuFromHLTFlag, SystematicsJpsi, SystematicsKshort, SystematicsBhabha, SystematicsCombinedHadronic, SystematicsCombinedLowMulti, SystematicsDstar, PRsemileptonicUntagged, LeptonicUntagged, dilepton, SLUntagged, B0toDstarl_Kpi_Kpipi0_Kpipipi, feiHadronicB0, feiHadronicBplus, feiSLB0, feiSLBplus, feiHadronic, feiSL, BtoXgamma, BtoXll, BtoXll_LFV, inclusiveBplusToKplusNuNu, TDCPV_ccs, TDCPV_qqs, BtoD0h_Kspi0, BtoD0h_Kspipipi0, B0toDpi_Kpipi, B0toDpi_Kspi, B0toDstarPi_D0pi_Kpi, B0toDstarPi_D0pi_Kpipipi_Kpipi0, B0toDrho_Kpipi, B0toDrho_Kspi, B0toDstarRho_D0pi_Kpi, B0toDstarRho_D0pi_Kpipipi_Kpipi0, BtoD0h_hh, BtoD0h_Kpi, BtoD0h_Kpipipi_Kpipi0, BtoD0h_Kshh, BtoD0rho_Kpi, BtoD0rho_Kpipipi_Kpipi0, B0toDD_Kpipi_Kspi, B0toDstarD, B0toD0Kpipi0_pi0, InclusiveLambda, BottomoniumEtabExclusive, BottomoniumUpsilon, CharmoniumPsi, XToD0_D0ToHpJm, XToD0_D0ToNeutrals, DstToD0Pi_D0ToRare, XToDp_DpToKsHp, XToDp_DpToHpHmJp, LambdacTopHpJm, DstToD0Pi_D0ToHpJm, DstToD0Pi_D0ToHpJmPi0, DstToD0Pi_D0ToHpHmPi0, DstToD0Pi_D0ToKsOmega, DstToD0Pi_D0ToHpJmEta, DstToD0Pi_D0ToNeutrals, DstToD0Pi_D0ToHpJmKs, EarlyData_DstToD0Pi_D0ToHpJmPi0, EarlyData_DstToD0Pi_D0ToHpHmPi0, DstToDpPi0_DpToHpPi0, DstToD0Pi_D0ToHpHmHpJm, SinglePhotonDark, GammaGammaControlKLMDark, ALP3Gamma, EGammaControlDark, InelasticDarkMatter, RadBhabhaV0Control, TauLFV, DimuonPlusMissingEnergy, ElectronMuonPlusMissingEnergy, DielectronPlusMissingEnergy, LFVZpVisible, TauGeneric, TauThrust, TwoTrackLeptonsForLuminosity, LowMassTwoTrack, SingleTagPseudoScalar, BtoPi0Pi0, BtoHadTracks, BtoHad1Pi0, BtoHad3Tracks1Pi0, BtoRhopRhom

Skim to run.

Optional Arguments

-o, --output-udst-name: Location of output uDST file.
-n, --max-input-events: Maximum number of input events to process.
-i, --input-file-list: Input file list

combined¶

Run several skims as a combined steering file.

b2skim-run combined [-h] [-n MaxInputEvents]
                    [-i InputFileList [InputFileList ...]]
                    Skim [Skim ...]

Required Arguments

Skim

Possible choices: Random, SystematicsTracking, Resonance, SystematicsRadMuMu, SystematicsEELL, SystematicsRadEE, SystematicsLambda, SystematicsPhiGamma, SystematicsFourLeptonFromHLTFlag, SystematicsRadMuMuFromHLTFlag, SystematicsJpsi, SystematicsKshort, SystematicsBhabha, SystematicsCombinedHadronic, SystematicsCombinedLowMulti, SystematicsDstar, PRsemileptonicUntagged, LeptonicUntagged, dilepton, SLUntagged, B0toDstarl_Kpi_Kpipi0_Kpipipi, feiHadronicB0, feiHadronicBplus, feiSLB0, feiSLBplus, feiHadronic, feiSL, BtoXgamma, BtoXll, BtoXll_LFV, inclusiveBplusToKplusNuNu, TDCPV_ccs, TDCPV_qqs, BtoD0h_Kspi0, BtoD0h_Kspipipi0, B0toDpi_Kpipi, B0toDpi_Kspi, B0toDstarPi_D0pi_Kpi, B0toDstarPi_D0pi_Kpipipi_Kpipi0, B0toDrho_Kpipi, B0toDrho_Kspi, B0toDstarRho_D0pi_Kpi, B0toDstarRho_D0pi_Kpipipi_Kpipi0, BtoD0h_hh, BtoD0h_Kpi, BtoD0h_Kpipipi_Kpipi0, BtoD0h_Kshh, BtoD0rho_Kpi, BtoD0rho_Kpipipi_Kpipi0, B0toDD_Kpipi_Kspi, B0toDstarD, B0toD0Kpipi0_pi0, InclusiveLambda, BottomoniumEtabExclusive, BottomoniumUpsilon, CharmoniumPsi, XToD0_D0ToHpJm, XToD0_D0ToNeutrals, DstToD0Pi_D0ToRare, XToDp_DpToKsHp, XToDp_DpToHpHmJp, LambdacTopHpJm, DstToD0Pi_D0ToHpJm, DstToD0Pi_D0ToHpJmPi0, DstToD0Pi_D0ToHpHmPi0, DstToD0Pi_D0ToKsOmega, DstToD0Pi_D0ToHpJmEta, DstToD0Pi_D0ToNeutrals, DstToD0Pi_D0ToHpJmKs, EarlyData_DstToD0Pi_D0ToHpJmPi0, EarlyData_DstToD0Pi_D0ToHpHmPi0, DstToDpPi0_DpToHpPi0, DstToD0Pi_D0ToHpHmHpJm, SinglePhotonDark, GammaGammaControlKLMDark, ALP3Gamma, EGammaControlDark, InelasticDarkMatter, RadBhabhaV0Control, TauLFV, DimuonPlusMissingEnergy, ElectronMuonPlusMissingEnergy, DielectronPlusMissingEnergy, LFVZpVisible, TauGeneric, TauThrust, TwoTrackLeptonsForLuminosity, LowMassTwoTrack, SingleTagPseudoScalar, BtoPi0Pi0, BtoHadTracks, BtoHad1Pi0, BtoHad3Tracks1Pi0, BtoRhopRhom

List of skims to run as a combined skim.

Optional Arguments

-n, --max-input-events: Maximum number of input events to process.
-i, --input-file-list: Input file list

module¶

Run all skims in a module.

b2skim-run module [-h] [-n MaxInputEvents]
                  [-i InputFileList [InputFileList ...]]
                  module

Required Arguments

module

Possible choices: charm, semileptonic, quarkonium, dark, lowMulti, ewp, fei, leptonic, tdcpv, systematics, taupair, btocharmless, btocharm

Skim module to run all skims for as combined steering file.

Optional Arguments

-n, --max-input-events: Maximum number of input events to process.
-i, --input-file-list: Input file list

`b2skim-generate`: Generate skim steering files¶

Tip

This tool is for cases where other tools does not suffice (such as running on the grid, or adding additional modules to the path after adding a skim.). If you just want to run a skim on KEKCC, consider using b2skim-run. If you want to test the performance of your skim, consider using the b2skim-stats tools.

Generate skim steering files.

This tool is for if you really need a steering file, and b2skim-run doesn’t cut it (such as if you are testing your skim on the grid).

usage: b2skim-generate [-h] [-o [OutputFilename]] [--no-stats]
                       [--skimmed-mdst-output] [--no-user-hints]
                       [--local-module LOCAL_MODULE]
                       Skim|Module [Skim|Module ...]

Required Arguments

Skim|Module

Possible choices: Random, SystematicsTracking, Resonance, SystematicsRadMuMu, SystematicsEELL, SystematicsRadEE, SystematicsLambda, SystematicsPhiGamma, SystematicsFourLeptonFromHLTFlag, SystematicsRadMuMuFromHLTFlag, SystematicsJpsi, SystematicsKshort, SystematicsBhabha, SystematicsCombinedHadronic, SystematicsCombinedLowMulti, SystematicsDstar, PRsemileptonicUntagged, LeptonicUntagged, dilepton, SLUntagged, B0toDstarl_Kpi_Kpipi0_Kpipipi, feiHadronicB0, feiHadronicBplus, feiSLB0, feiSLBplus, feiHadronic, feiSL, BtoXgamma, BtoXll, BtoXll_LFV, inclusiveBplusToKplusNuNu, TDCPV_ccs, TDCPV_qqs, BtoD0h_Kspi0, BtoD0h_Kspipipi0, B0toDpi_Kpipi, B0toDpi_Kspi, B0toDstarPi_D0pi_Kpi, B0toDstarPi_D0pi_Kpipipi_Kpipi0, B0toDrho_Kpipi, B0toDrho_Kspi, B0toDstarRho_D0pi_Kpi, B0toDstarRho_D0pi_Kpipipi_Kpipi0, BtoD0h_hh, BtoD0h_Kpi, BtoD0h_Kpipipi_Kpipi0, BtoD0h_Kshh, BtoD0rho_Kpi, BtoD0rho_Kpipipi_Kpipi0, B0toDD_Kpipi_Kspi, B0toDstarD, B0toD0Kpipi0_pi0, InclusiveLambda, BottomoniumEtabExclusive, BottomoniumUpsilon, CharmoniumPsi, XToD0_D0ToHpJm, XToD0_D0ToNeutrals, DstToD0Pi_D0ToRare, XToDp_DpToKsHp, XToDp_DpToHpHmJp, LambdacTopHpJm, DstToD0Pi_D0ToHpJm, DstToD0Pi_D0ToHpJmPi0, DstToD0Pi_D0ToHpHmPi0, DstToD0Pi_D0ToKsOmega, DstToD0Pi_D0ToHpJmEta, DstToD0Pi_D0ToNeutrals, DstToD0Pi_D0ToHpJmKs, EarlyData_DstToD0Pi_D0ToHpJmPi0, EarlyData_DstToD0Pi_D0ToHpHmPi0, DstToDpPi0_DpToHpPi0, DstToD0Pi_D0ToHpHmHpJm, SinglePhotonDark, GammaGammaControlKLMDark, ALP3Gamma, EGammaControlDark, InelasticDarkMatter, RadBhabhaV0Control, TauLFV, DimuonPlusMissingEnergy, ElectronMuonPlusMissingEnergy, DielectronPlusMissingEnergy, LFVZpVisible, TauGeneric, TauThrust, TwoTrackLeptonsForLuminosity, LowMassTwoTrack, SingleTagPseudoScalar, BtoPi0Pi0, BtoHadTracks, BtoHad1Pi0, BtoHad3Tracks1Pi0, BtoRhopRhom, charm, semileptonic, quarkonium, dark, lowMulti, ewp, fei, leptonic, tdcpv, systematics, taupair, btocharmless, btocharm

Skim/s to produce a steering file for. If more than one skim is provided, then a combined steering file is produced. If a module name is passed, the combined steering file will contain all skims in that module.

Optional Arguments

-o, --output-script-name

Location to output steering file. If flag not given, code is printed to screen. If flag is given with no arguments, writes to a file in the current directory using a default name.

Default: “”

--no-stats

If flag passed, print(b2.statistics) will not be included at the end of the steering file.

--skimmed-mdst-output

If flag passed, save a single MDST containing events which pass at least one of the skims.

--no-user-hints

If flag passed, the steering file will not include a comment explaining how to add modules to the path after building the skim lists.

--local-module

[EXPERT FLAG] Name of local module to import skim functions from. Script will fail if skims come from more than one module.

17.4.4. Skim tutorial¶

A Jupyter notebook skimming tutorial can be found in skim/tutorial/Skimming_Tutorial.ipynb in basf2.

17.4.5. Skim registry¶

All skims must be registered and encoded by the relevant skim liaison. Registering a skim is as simple as adding it to the list in skim/scripts/skim/registry.py as an entry of the form (SkimCode, ParentModule, SkimName).

The skim numbering convention is defined on the Confluence skim page.

skim.registry.Registry = <skim.registry.SkimRegistryClass object>¶

An instance of skim.registry.SkimRegistryClass. Use this in your script to get information from the registry.

>>> from skim.registry import Registry
>>> Registry.encode_skim_name("SinglePhotonDark")
18020100

class skim.registry.SkimRegistryClass[source]¶

Class containing information on all official registered skims. This class also contains helper functions for getting information from the registry. For convenience, an instance of this class is provided: skim.registry.Registry.

The table below lists all registered skims and their skim codes:

Module	Skim name	Skim code
btocharm	BtoD0h_Kspi0	14120300
	BtoD0h_Kspipipi0	14120400
	B0toDpi_Kpipi	14120600
	B0toDpi_Kspi	14120601
	B0toDstarPi_D0pi_Kpi	14120700
	B0toDstarPi_D0pi_Kpipipi_Kpipi0	14120800
	B0toDrho_Kpipi	14121100
	B0toDrho_Kspi	14121101
	B0toDstarRho_D0pi_Kpi	14121200
	B0toDstarRho_D0pi_Kpipipi_Kpipi0	14121201
	B0toD0Kpipi0_pi0	14121300
	BtoD0h_hh	14140100
	BtoD0h_Kpi	14140101
	BtoD0h_Kpipipi_Kpipi0	14140102
	BtoD0h_Kshh	14140200
	BtoD0rho_Kpi	14141000
	BtoD0rho_Kpipipi_Kpipi0	14141001
	B0toDD_Kpipi_Kspi	14141002
	B0toDstarD	14141003
btocharmless	BtoPi0Pi0	19120100
	BtoRhopRhom	19120400
	BtoHadTracks	19130201
	BtoHad1Pi0	19130300
	BtoHad3Tracks1Pi0	19130310
charm	XToD0_D0ToHpJm	17230100
	XToD0_D0ToNeutrals	17230200
	DstToD0Pi_D0ToRare	17230300
	XToDp_DpToKsHp	17230400
	XToDp_DpToHpHmJp	17230500
	LambdacTopHpJm	17230600
	DstToD0Pi_D0ToHpJm	17240100
	DstToD0Pi_D0ToHpJmPi0	17240200
	DstToD0Pi_D0ToHpHmPi0	17240300
	DstToD0Pi_D0ToKsOmega	17240400
	DstToD0Pi_D0ToHpJmEta	17240500
	DstToD0Pi_D0ToNeutrals	17240600
	DstToD0Pi_D0ToHpJmKs	17240700
	EarlyData_DstToD0Pi_D0ToHpJmPi0	17240800
	EarlyData_DstToD0Pi_D0ToHpHmPi0	17240900
	DstToDpPi0_DpToHpPi0	17241000
	DstToD0Pi_D0ToHpHmHpJm	17241100
dark	InelasticDarkMatter	18000000
	RadBhabhaV0Control	18000001
	SinglePhotonDark	18020100
	GammaGammaControlKLMDark	18020200
	ALP3Gamma	18020300
	EGammaControlDark	18020400
	DimuonPlusMissingEnergy	18520100
	ElectronMuonPlusMissingEnergy	18520200
	DielectronPlusMissingEnergy	18520300
	LFVZpVisible	18520400
ewp	BtoXgamma	12160100
	BtoXll	12160200
	BtoXll_LFV	12160300
	inclusiveBplusToKplusNuNu	12160400
fei	feiHadronicB0	11180100
	feiHadronicBplus	11180200
	feiSLB0	11180300
	feiSLBplus	11180400
	feiHadronic	11180500
	feiSL	11180600
leptonic	LeptonicUntagged	11130300
leptonic	dilepton	11130301
lowMulti	LowMassTwoTrack	18520500
	TwoTrackLeptonsForLuminosity	18530100
	SingleTagPseudoScalar	18530200
quarkonium	InclusiveLambda	15410300
	BottomoniumEtabExclusive	15420100
	BottomoniumUpsilon	15440100
	CharmoniumPsi	16460200
semileptonic	PRsemileptonicUntagged	11110100
	SLUntagged	11160200
	B0toDstarl_Kpi_Kpipi0_Kpipipi	11160201
systematics	Random	10000000
	SystematicsTracking	10600300
	Resonance	10600400
	SystematicsRadMuMu	10600500
	SystematicsEELL	10600600
	SystematicsRadEE	10600700
	SystematicsFourLeptonFromHLTFlag	10600800
	SystematicsRadMuMuFromHLTFlag	10600900
	SystematicsBhabha	10601200
	SystematicsCombinedHadronic	10601300
	SystematicsCombinedLowMulti	10601400
	SystematicsDstar	10601500
	SystematicsJpsi	10611000
	SystematicsKshort	10611100
	SystematicsLambda	10620200
	SystematicsPhiGamma	11640100
taupair	TauLFV	18360100
	TauGeneric	18570600
	TauThrust	18570700
tdcpv	TDCPV_ccs	13160200
tdcpv	TDCPV_qqs	13160300

property codes¶: A list of all registered skim codes.

decode_skim_code(SkimCode)[source]¶

Find the name of the skim which corresponds to the provided skim code.

This is useful to determine the skim script used to produce a specific uDST file, given the 8-digit code name of the file itself.

Parameters: SkimCode (str) – 8 digit skim code assigned to some skim.
Returns: Name of the corresponding skim as it appears in the skim registry.

encode_skim_name(SkimName)[source]¶

Find the 8 digit skim code assigned to the skim with the provided name.

Parameters: SkimName (str) – Name of the corresponding skim as it appears in the skim registry.
Returns: 8 digit skim code assigned to the given skim.

get_skim_function(SkimName)[source]¶

Get the skim class constructor for the given skim.

This is achieved by importing the module listed alongside the skim name in the skim registry.

Parameters: SkimName (str) – Name of the skim to be found.
Returns: The class constructor for the given skim.

get_skim_module(SkimName)[source]¶

Retrieve the skim module name from the registry which contains the given skim.

Parameters: SkimName (str) – Name of the skim as it appears in the skim registry.
Returns: The name of the skim module which contains the skim.

get_skims_in_module(SkimModule)[source]¶

Retrieve a list of the skims listed in the registry as existing in the given skim module.

Parameters: SkimModule (str) – The name of the module, e.g. btocharmless (not skim.btocharmless or btocharmless.py).
Returns: The skims listed in the registry as belonging to SkimModule.

property modules¶: A list of all registered skim modules.

property names¶: A list of all registered skim names.

17.4.6. Testing skim performance¶

When skims are developed, it is important to test the performance of the skim on a data and on a range of background MC samples. Two command-line tools are provided in the skim package to aid in this: b2skim-stats-submit and b2skim-stats-print. They are available in the PATH after setting up the basf2 environment after calling b2setup. The former submits a series of test jobs for a skim on data and MC samples, and the latter uses the output files of the jobs to calculate performance statistics for each sample including retention rate, CPU time, and uDST size per event. b2skim-stats-print also provides estimates for combined MC samples, where the statistics are weighted by the cross-section of each background process.

First run b2skim-stats-submit, which will submit small skim jobs on test files of MC and data using bsub. For example,

b2skim-stats-submit -s LeptonicUntagged SLUntagged

Monitor your jobs with bjobs -u USERNAME. Once all of the submitted jobs have completed successfully, then run b2skim-stats-print.

b2skim-stats-print -s LeptonicUntagged SLUntagged

This will read the output files of the test jobs, and produce tables of statistics in a variety of outputs.

By default, a subset of the statistics are printed to the screen.
If the -M flag is provided, a Markdown table will be written to SkimStats.md. This table is in a format that can be copied into the comment fields of pull requests (where BitBucket will format the table nicely for you). Use this flag when asked to produce a table of stats in a pull request.
If the -C flag is provided, a text file SkimStats.txt is written, in which the statistics are formatted as Confluence wiki markup tables. These tables can be copied directly onto a Confluence page by editing the page, selecting Insert more content from the toolbar, selecting Markup from the drop-down menu, and then pasting the content of the text file into the markup editor which appears. Confluence will then format the tables and headings. The markup editor can also be accessed via ctrl-shift-D (cmd-shift-D).
If the -J flag is provided, then all statistics produced are printed to a JSON file SkimStats.json, indexed by skim, statistic, and sample label. This file contains extra metadata about when and how the tests were run. This file is to be used by grid production tools.

Tip

To test your own newly-developed skim, make sure you have followed all the instructions in Writing a skim, particularly the instructions regarding the skim registry.

`b2skim-stats-submit`: Run skim scripts on test samples¶

Note

Please run these skim tests on KEKCC, so that the estimates for CPU time are directly comparable to one another.

Submits test jobs for a given set of skims, and saves the output in a format to be read by b2skim-stats-print. One or more standalone or combined skim names must be provided.

usage: b2skim-stats-submit [-h]
                           (-s skim [skim ...] | -c YAMLFile [CombinedSkim ...])
                           [-n nEventsPerSample] [--mccampaign {MC12,MC13}]
                           [--dry-run] [--mconly | --dataonly]

Optional Arguments

-s, --single

Possible choices: all, Random, SystematicsTracking, Resonance, SystematicsRadMuMu, SystematicsEELL, SystematicsRadEE, SystematicsLambda, SystematicsPhiGamma, SystematicsFourLeptonFromHLTFlag, SystematicsRadMuMuFromHLTFlag, SystematicsJpsi, SystematicsKshort, SystematicsBhabha, SystematicsCombinedHadronic, SystematicsCombinedLowMulti, SystematicsDstar, PRsemileptonicUntagged, LeptonicUntagged, dilepton, SLUntagged, B0toDstarl_Kpi_Kpipi0_Kpipipi, feiHadronicB0, feiHadronicBplus, feiSLB0, feiSLBplus, feiHadronic, feiSL, BtoXgamma, BtoXll, BtoXll_LFV, inclusiveBplusToKplusNuNu, TDCPV_ccs, TDCPV_qqs, BtoD0h_Kspi0, BtoD0h_Kspipipi0, B0toDpi_Kpipi, B0toDpi_Kspi, B0toDstarPi_D0pi_Kpi, B0toDstarPi_D0pi_Kpipipi_Kpipi0, B0toDrho_Kpipi, B0toDrho_Kspi, B0toDstarRho_D0pi_Kpi, B0toDstarRho_D0pi_Kpipipi_Kpipi0, BtoD0h_hh, BtoD0h_Kpi, BtoD0h_Kpipipi_Kpipi0, BtoD0h_Kshh, BtoD0rho_Kpi, BtoD0rho_Kpipipi_Kpipi0, B0toDD_Kpipi_Kspi, B0toDstarD, B0toD0Kpipi0_pi0, InclusiveLambda, BottomoniumEtabExclusive, BottomoniumUpsilon, CharmoniumPsi, XToD0_D0ToHpJm, XToD0_D0ToNeutrals, DstToD0Pi_D0ToRare, XToDp_DpToKsHp, XToDp_DpToHpHmJp, LambdacTopHpJm, DstToD0Pi_D0ToHpJm, DstToD0Pi_D0ToHpJmPi0, DstToD0Pi_D0ToHpHmPi0, DstToD0Pi_D0ToKsOmega, DstToD0Pi_D0ToHpJmEta, DstToD0Pi_D0ToNeutrals, DstToD0Pi_D0ToHpJmKs, EarlyData_DstToD0Pi_D0ToHpJmPi0, EarlyData_DstToD0Pi_D0ToHpHmPi0, DstToDpPi0_DpToHpPi0, DstToD0Pi_D0ToHpHmHpJm, SinglePhotonDark, GammaGammaControlKLMDark, ALP3Gamma, EGammaControlDark, InelasticDarkMatter, RadBhabhaV0Control, TauLFV, DimuonPlusMissingEnergy, ElectronMuonPlusMissingEnergy, DielectronPlusMissingEnergy, LFVZpVisible, TauGeneric, TauThrust, TwoTrackLeptonsForLuminosity, LowMassTwoTrack, SingleTagPseudoScalar, BtoPi0Pi0, BtoHadTracks, BtoHad1Pi0, BtoHad3Tracks1Pi0, BtoRhopRhom

List of individual skims to run.

Default: []

-c, --combined

List of combined skims to run. This flag expects as its first argument the path to a YAML defining the combined skims. All remaining arguments are the combined skims to test. The YAML file is simply a mapping of combined skim names to the invidivual skims comprising them. For example, feiSL: [feiSLB0, feiSLBplus].

-n

Number of events to run per sample. This input can be any positive number, but the actual number events run is limited to the size of the test files (~200,000 for MC files and ~20,000 for data files).

Default: 10000

--mccampaign

Possible choices: MC12, MC13

The MC campaign to test on.

Default: “MC13”

--dry-run, --dry

Print the submission commands, but don’t run them.

--mconly

Test on only MC samples.

--dataonly

Test on only data samples.

`b2skim-stats-print`: Print tables of performance statistics¶

Note

This tool uses the third-party package tabulate, which can be installed via pip.

This will be included in a future version of the externals.

Reads the output files of test skim jobs from b2skim-stats-submit and prints tables of performance statistics. One or more single or combined skim names must be provided.

usage: b2skim-stats-print [-h]
                          (-s skim [skim ...] | -c CombinedSkim [CombinedSkim ...])
                          [-C [OutputFilename] | -M [OutputFilename] | -J
                          [OutputFilename] | --average-over {skims,samples}
                          [{skims,samples} ...]] [--mccampaign {MC12,MC13}]
                          [--mconly | --dataonly] [-v]

Optional Arguments

-s, --single

Possible choices: all, Random, SystematicsTracking, Resonance, SystematicsRadMuMu, SystematicsEELL, SystematicsRadEE, SystematicsLambda, SystematicsPhiGamma, SystematicsFourLeptonFromHLTFlag, SystematicsRadMuMuFromHLTFlag, SystematicsJpsi, SystematicsKshort, SystematicsBhabha, SystematicsCombinedHadronic, SystematicsCombinedLowMulti, SystematicsDstar, PRsemileptonicUntagged, LeptonicUntagged, dilepton, SLUntagged, B0toDstarl_Kpi_Kpipi0_Kpipipi, feiHadronicB0, feiHadronicBplus, feiSLB0, feiSLBplus, feiHadronic, feiSL, BtoXgamma, BtoXll, BtoXll_LFV, inclusiveBplusToKplusNuNu, TDCPV_ccs, TDCPV_qqs, BtoD0h_Kspi0, BtoD0h_Kspipipi0, B0toDpi_Kpipi, B0toDpi_Kspi, B0toDstarPi_D0pi_Kpi, B0toDstarPi_D0pi_Kpipipi_Kpipi0, B0toDrho_Kpipi, B0toDrho_Kspi, B0toDstarRho_D0pi_Kpi, B0toDstarRho_D0pi_Kpipipi_Kpipi0, BtoD0h_hh, BtoD0h_Kpi, BtoD0h_Kpipipi_Kpipi0, BtoD0h_Kshh, BtoD0rho_Kpi, BtoD0rho_Kpipipi_Kpipi0, B0toDD_Kpipi_Kspi, B0toDstarD, B0toD0Kpipi0_pi0, InclusiveLambda, BottomoniumEtabExclusive, BottomoniumUpsilon, CharmoniumPsi, XToD0_D0ToHpJm, XToD0_D0ToNeutrals, DstToD0Pi_D0ToRare, XToDp_DpToKsHp, XToDp_DpToHpHmJp, LambdacTopHpJm, DstToD0Pi_D0ToHpJm, DstToD0Pi_D0ToHpJmPi0, DstToD0Pi_D0ToHpHmPi0, DstToD0Pi_D0ToKsOmega, DstToD0Pi_D0ToHpJmEta, DstToD0Pi_D0ToNeutrals, DstToD0Pi_D0ToHpJmKs, EarlyData_DstToD0Pi_D0ToHpJmPi0, EarlyData_DstToD0Pi_D0ToHpHmPi0, DstToDpPi0_DpToHpPi0, DstToD0Pi_D0ToHpHmHpJm, SinglePhotonDark, GammaGammaControlKLMDark, ALP3Gamma, EGammaControlDark, InelasticDarkMatter, RadBhabhaV0Control, TauLFV, DimuonPlusMissingEnergy, ElectronMuonPlusMissingEnergy, DielectronPlusMissingEnergy, LFVZpVisible, TauGeneric, TauThrust, TwoTrackLeptonsForLuminosity, LowMassTwoTrack, SingleTagPseudoScalar, BtoPi0Pi0, BtoHadTracks, BtoHad1Pi0, BtoHad3Tracks1Pi0, BtoRhopRhom

List of individual skims to run.

Default: []

-c, --combined

List of combined skims to run.

Default: []

-C, --confluence

Save a wiki markup table to be copied to Confluence.

-M, --markdown

Save a markdown table in a format that can be copied into pull request comments.

-J, --json

Save the tables of statistics to a JSON file.

--average-over

Possible choices: skims, samples

If this argument is given, this tool will produce stat averages over all given skims or all given samples. These are printed in JSON format.

Default: []

--mccampaign

Possible choices: MC12, MC13

The MC campaign to test on.

Default: “MC13”

--mconly

Test on only MC samples.

--dataonly

Test on only data samples.

-v, --verbose

Print out extra warning messages when the script cannot calculate a value, but moves on anyway.

17.4.7. Utility functions for skim experts¶

The module skimExpertFunctions contains helper functions to perform common tasks relating to skims. Importantly, it contains the skimExpertFunctions.BaseSkim class, which is how skims are defined.

class skimExpertFunctions.BaseSkim(*, OutputFileName=None, additionalDataDescription=None, udstOutput=True, validation=False)[source]¶

Base class for skims. Initialises a skim name, and creates template functions required for each skim.

See Writing a skim for information on how to use this to define a new skim.

ApplyHLTHadronCut = False¶: If this property is set to True, then the HLT selection for hlt_hadron will be applied to the skim lists when the skim is added to the path.

MergeDataStructures = {}¶: Dict of str -> function pairs to determine if any special data structures should be merged when combining skims. Currently, this is only used to merge FEI config parameters when running multiple FEI skims at once, so that it can be run just once with all the necessary arguments.

NoisyModules = None¶: List of module types to be silenced. This may be necessary in certain skims in order to keep log file sizes small.

Tip

The elements of this list should be the module type, which is not necessarily the same as the module name. The module type can be inspected in Python via module.type().

See also

This attribute is used by BaseSkim.set_skim_logging.

TestFiles = ['/group/belle2/dataprod/MC/SkimTraining/mixed_BGx1.mdst_000001_prod00009434_task10020000001.root']¶: Location of an MDST file to test the skim on. Defaults to an MC13 mixed BGx1 sample. If you want to use a different test file for your skim, set it using get_test_file.

additional_setup(path)[source]¶

Perform any setup steps necessary before running the skim. This may include:

applying event-level cuts using ifEventPasses,
adding the MCMatcherParticles module to the path,
running the FEI.

Warning

Standard particle lists should not be loaded in here. This should be done by overriding the method BaseSkim.load_standard_lists. This is crucial for avoiding loading lists twice when combining skims for production.

Parameters: path (basf2.Path) – Skim path to be processed.

apply_hlt_hadron_cut_if_required(path)[source]¶

Apply the hlt_hadron selection if the property ApplyHLTHadronCut is True.

Parameters: path (basf2.Path) – Skim path to be processed.

abstract build_lists(path)[source]¶

Create the skim lists to be saved in the output uDST. This function is where the main skim cuts should be applied. At the end of this method, the attribute SkimLists must be set so it can be used by output_udst.

Parameters: path (basf2.Path) – Skim path to be processed.

property code¶: Eight-digit code assigned to this skim in the registry.

property flag¶: Event-level variable indicating whether an event passes the skim or not. To use the skim flag without writing uDST output, use the argument udstOutput=False when instantiating the skim class.

get_skim_list_names()[source]¶: Get the list of skim particle list names, without creating the particle lists on the current path.

initialise_skim_flag(path)[source]¶: Add the module skimExpertFunctions.InitialiseSkimFlag to the path, which initialises flag for this skim to zero.

load_standard_lists(path)[source]¶

Load any standard lists. This code will be run before any BaseSkim.additional_setup and BaseSkim.build_lists.

Note

This is separated into its own function so that when skims are combined, any standard lists used by two skims can be loaded just once.

Parameters: path (basf2.Path) – Skim path to be processed.

output_udst(path)[source]¶

Write the skim particle lists to an output uDST and print a summary of the skim list statistics.

Parameters: path (basf2.Path) – Skim path to be processed.

property postskim_path¶

Return the skim path.

If BaseSkim.skim_event_cuts has been run, then the skim lists will only be created on a conditional path, so subsequent modules should be added to the conditional path.
If BaseSkim.skim_event_cuts has not been run, then the main analysis path is returned.

produce_on_tau_samples = True¶

If this property is set to False, then b2skim-prod will not produce data production requests for this skim on taupair MC samples. This decision may be made for one of two reasons:

The retention rate of the skim on taupair samples is basically zero, so there is no point producing the skim for these samples.
The retention rate of the skim on taupair samples is too high (>20%), so the production system may struggle to handle the jobs.

set_skim_logging()[source]¶

Turns the log level to ERROR for selected modules to decrease the total size of the skim log files. Additional modules can be silenced by setting the attribute NoisyModules for an individual skim.

Parameters: path (basf2.Path) – Skim path to be processed.

Warning

This method works by inspecting the modules added to the path, and setting the log level to ERROR. This method should be called after all skim-related modules are added to the path.

skim_event_cuts(cut, *, path)[source]¶

Apply event-level cuts in a skim-safe way.

Parameters

cut (str) – Event-level cut to be applied.
path (basf2.Path) – Skim path to be processed.

Returns

Path on which the rest of this skim should be: processed. On this path, only events which passed the event-level cut will be processed further.

Return type

ConditionalPath (basf2.Path)

Tip

If running this function in BaseSkim.additional_setup or BaseSkim.build_lists, redefine the path to the path returned by BaseSkim.skim_event_cuts, e.g.

def build_lists(self, path):
    path = self.skim_event_cuts("nTracks>4", path=path)
    # rest of skim list building...

Note

The motivation for using this function over applyEventCuts is that applyEventCuts completely removes events from processing. If we combine multiple skims in a single steering file (which is done in production), and the first has a set of event-level cuts, then all the remaining skims will never even see those events.

Internally, this function creates a new path, which is only processed for events passing the event-level cut. To avoid issues around particles not being available on the main path (leading to noisy error logs), we need to add the rest of the skim to this path. So this new path is assigned to the attribute BaseSkim._ConditionalPath, and BaseSkim.__call__ will run all remaining methods on this path.

update_skim_flag(path)[source]¶: Add the module skimExpertFunctions.InitialiseSkimFlag to the path, which initialises flag for this skim to zero.

Warning

If a conditional path has been created before this, then this function must run on the conditional path, since the skim lists are not guaranteed to exist for all events on the main path.

validation_histograms(path)[source]¶

Create validation histograms for the skim.

Parameters: path (basf2.Path) – Skim path to be processed.

class skimExpertFunctions.CombinedSkim(*skims, NoisyModules=None, additionalDataDescription=None, udstOutput=None, mdstOutput=False, mdst_kwargs=None, CombinedSkimName='CombinedSkim', OutputFileName=None)[source]¶

Class for creating combined skims which can be run using similar-looking methods to BaseSkim objects.

A steering file which combines skims can be as simple as the following:

import basf2 as b2
import modularAnalysis as ma
from skim.foo import OneSkim, TwoSkim, RedSkim, BlueSkim

path = b2.Path()
ma.inputMdstList("default", [], path=path)
skims = CombinedSkim(OneSkim(), TwoSkim(), RedSkim(), BlueSkim())
skims(path)  # load standard lists, create skim lists, and save to uDST
path.process()

When skims are combined using this class, the BaseSkim.NoisyModules lists of each skim are combined and all silenced.

The heavy-lifting functions additional_setup, build_lists and output_udst are modified to loop over the corresponding functions of each invididual skim. The load_standard_lists method is also modified to load all required lists, without accidentally loading a list twice.

Calling an instance of the CombinedSkim class will load all the required particle lists, then run all the setup steps, then the list building functions, and then all the output steps.

additional_setup(path)[source]¶

Run the BaseSkim.additional_setup function of each skim.

Parameters: path (basf2.Path) – Skim path to be processed.

apply_hlt_hadron_cut_if_required(path)[source]¶

Run the BaseSkim.apply_hlt_hadron_cut_if_required function for each skim.

Parameters: path (basf2.Path) – Skim path to be processed.

build_lists(path)[source]¶

Run the BaseSkim.build_lists function of each skim.

Parameters: path (basf2.Path) – Skim path to be processed.

property flag¶: Event-level variable indicating whether an event passes the combinedSkim or not.

property flags¶: List of flags for each skim in combined skim.

initialise_skim_flag(path)[source]¶: Add the module skimExpertFunctions.InitialiseSkimFlag to the path, to initialise flags for each skim.

load_standard_lists(path)[source]¶

Add all required standard list loading to the path.

Note

To avoid loading standard lists twice, this function creates dummy paths that are passed through load_standard_lists for each skim. These dummy paths are then inspected, and a list of unique module-parameter combinations is added to the main skim path.

Parameters: path (basf2.Path) – Skim path to be processed.

merge_data_structures()[source]¶

Read the values of BaseSkim.MergeDataStructures and merge data structures accordingly.

For example, if MergeDataStructures has the value {"FEIChannelArgs": _merge_boolean_dicts.__func__}, then _merge_boolean_dicts is run on all input skims with the attribute FEIChannelArgs, and the value of FEIChannelArgs for that skim is set to the result.

In the FEI skims, this is used to merge configs which are passed to a cached function, thus allowing us to apply the FEI once with all the required particles available.

output_mdst_if_any_flag_passes(*, path, **kwargs)[source]¶

Add MDST output to the path if the event passes any of the skim flags. EventExtraInfo is included in the MDST output so that the flags are available in the output.

The CombinedSkimName parameter in the CombinedSkim initialisation is used for the output filename if filename is not included in kwargs.

Parameters

path (basf2.Path) – Skim path to be processed.
**kwargs – Passed on to mdst.add_mdst_output.

output_udst(path)[source]¶

Run the BaseSkim.output_udst function of each skim.

Parameters: path (basf2.Path) – Skim path to be processed.

property produce_on_tau_samples¶

Corresponding value of this attribute for each individual skim.

Raises: RuntimeError – Raised if the individual skims in combined skim contain a mix of True and False for this property.

set_skim_logging()[source]¶: Run BaseSkim.set_skim_logging for each skim.

update_skim_flag(path)[source]¶: Add the module skimExpertFunctions.InitialiseSkimFlag to the conditional path of each skims.

class skimExpertFunctions.InitialiseSkimFlag(*skims)[source]¶

[Module for skim expert usage] Create the EventExtraInfo DataStore object, and set all required flag variables to zero.

Note

Add this module to the path before adding any skims, so that the skim flags are defined in the datastore for all events.

event()[source]¶: Initialise flags to zero.

initialize()[source]¶: Register EventExtraInfo in datastore if it has not been registered already.

class skimExpertFunctions.UpdateSkimFlag(skim)[source]¶

[Module for skim expert usage] Update the skim flag to be 1 if there is at least one candidate in any of the skim lists.

Note

Add this module to the post-skim path of each skim in the combined skim, as the skim lists are only guaranteed to exist on the conditional path (if a conditional path was used).

event()[source]¶: Check if at least one skim list is non-empty; if so, update the skim flag to 1.

initialize()[source]¶: Check EventExtraInfo has been registered previously. This registration should be done by InitialiseSkimFlag.

skimExpertFunctions.add_skim(label, lists, path)[source]¶

create uDST skim for given lists, saving into $label.udst.root Particles not necessary for the given particle lists are not saved.

Parameters

label (str) – the registered skim name
lists (list(str)) – the list of ParticleList names that have been created by a skim list builder function
path (basf2.Path) – modules are added to this path

skimExpertFunctions.fancy_skim_header(SkimClass)[source]¶

Decorator to generate a fancy header to skim documentation and prepend it to the docstring. Add this just above the definition of a skim.

Also ensures the documentation of the template functions like BaseSkim.build_lists is not repeated in every skim documentation.

@fancy_skim_header
class MySkimName(BaseSkim):
    # docstring here describing your skim, and explaining cuts.

skimExpertFunctions.get_eventN(fileName)[source]¶

Returns the number of events in a specific file

Parameters: filename – Name of the file as clearly indicated in the argument’s name.

skimExpertFunctions.get_events_per_file(sampleName)[source]¶

Returns an estimate for the average number of events in an input MDST file of the given sample type.

Parameters: sampleName (str) – Name of the sample. MC samples are named e.g. “MC12_chargedBGx1”, “MC9_ccbarBGx0”
Returns: The average number of events in file of the given sample type.
Return type: nEventsPerFile (int)

skimExpertFunctions.get_test_file(sampleName)[source]¶

Returns the KEKCC location of files used specifically for skim testing

Parameters: sampleName (str) – Name of the sample. MC samples are named e.g. “MC12_chargedBGx1”, “MC9_ccbarBGx0”
Returns: The path to the test file on KEKCC.
Return type: sampleFileName (str)

skimExpertFunctions.get_total_infiles(sampleName)[source]¶

Returns the total number of input MDST files for a given sample. This is useful for resource estimate.

Parameters: sampleName (str) – Name of the sample. MC samples are named e.g. “MC12_chargedBGx1”, “MC9_ccbarBGx0”
Returns: Total number of input files for sample.
Return type: nInFiles (int)

skimExpertFunctions.ifEventPasses(cut, conditional_path, path)[source]¶

If the event passes the given cut proceed to process everything in conditional_path. Afterwards return here and continue processing with the next module.

Parameters

cut (str) – selection criteria which needs to be fulfilled in order to continue with conditional_path
conditional_path (basf2.Path) – path to execute if the event fulfills the criteria cut
path (basf2.Path) – modules are added to this path

skimExpertFunctions.resolve_skim_modules(SkimsOrModules, *, LocalModule=None)[source]¶

Produce an ordered list of skims, by expanding any Python skim module names into a list of skims in that module. Also produce a dict of skims grouped by Python module.

Raises

RuntimeError – Raised if a skim is listed twice.
ValueError – Raised if LocalModule is passed and skims are normally expected from more than one module.

skimExpertFunctions.setSkimLogging(path, additional_modules=[])[source]¶

Turns the log level to ERROR for several modules to decrease the total size of the skim log files

Parameters

skim_path (basf2.Path) – modules are added to this path
additional_modules (list(str)) – an optional list of extra noisy module names that should be silenced

skimExpertFunctions.skimOutputMdst(skimDecayMode, path=None, skimParticleLists=[], outputParticleLists=[], includeArrays=[], *, outputFile=None, dataDescription=None)[source]¶

Create a new path for events that contain a non-empty particle list specified via skimParticleLists. Write the accepted events as a mdst file, saving only particles from skimParticleLists and from outputParticleLists. It outputs a .mdst file. Additional Store Arrays and Relations to be stored can be specified via includeArrays list argument.

Parameters

skimDecayMode (str) – Name of the skim. If no outputFile is given this is also the name of the output filename. This name will be added to the FileMetaData as an extra data description “skimDecayMode”
skimParticleLists (list(str)) – Names of the particle lists to skim for. An event will be accepted if at least one of the particle lists is not empty
outputParticleLists (list(str)) – Names of the particle lists to store in the output in addition to the ones in skimParticleLists
includeArrays (list(str)) – datastore arrays/objects to write to the output file in addition to mdst and particle information
path (basf2.Path) – Path to add the skim output to. Defaults to the default analysis path
outputFile (str) – Name of the output file if different from the skim name
dataDescription (dict) – Additional data descriptions to add to the output file. For example {“mcEventType”:”mixed”}

17.4.8. `b2skim-prod`: Produce grid production requests¶

Note

This tool is intended for use by skim production managers, not by skim liaisons.

b2skim-prod is a tool for producing grid production requests in the format required by the production system, and also generating combined steering files.

YAML files are used by this tool to define the LPNs of datasets. Below are examples of valid YAML entries for data and MC. The tool lpns2yaml.py is provided to create these YAML files from a list of LPNs.

## Example of a YAML file for data:
proc9_exp3r1:
    sampleLabel: proc9_exp3  # This label must match a skim sample in TestFiles.yaml
    LPNPrefix: /belle/Data
    inputReleaseNumber: release-03-02-02
    prodNumber: prod00008530
    inputDBGlobalTag: DB00000654
    procNumber: proc9
    experimentNumber: e0003
    beamEnergy: 4S
    inputDataLevel: mdst
    runNumbers:
        - r02724
        - r02801
        - r02802

proc9_exp3r2:
    sampleLabel: proc9_exp3
    LPNPrefix: /belle/Data
    inputReleaseNumber: release-03-02-02
    # prodNumber, inputDBGlobalTag, experimentNumber, and runNumbers can be integers
    prodNumber: 8530
    inputDBGlobalTag: 654
    procNumber: proc9
    experimentNumber: 3
    beamEnergy: 4S
    inputDataLevel: mdst
    runNumbers:
        - 3237
        - 3238
        - 3239


## Example of a YAML file for MC:
MC12b_mixed:
    sampleLabel: MC12_mixedBGx1
    LPNPrefix: /belle/MC
    inputReleaseNumber: release-03-01-00
    inputDBGlobalTag: DB00000547
    mcCampaign: MC12b
    prodNumber: prod00007392
    experimentNumber: s00/e1003
    beamEnergy: 4S
    mcType: mixed
    mcBackground: BGx1
    inputDataLevel: mdst
    runNumber: r00000

MC12b_charged:
    sampleLabel: MC12_chargedBGx1
    LPNPrefix: /belle/MC
    inputReleaseNumber: release-03-01-00
    # inputDBGlobalTag, prodNumber, and runNumber can be integers
    inputDBGlobalTag: 547
    mcCampaign: MC12b
    prodNumber:
        - 7799  # prodNumber can be a list
        - 7802
    experimentNumber: s00/e1003
    beamEnergy: 4S
    mcType: charged
    mcBackground: BGx1
    inputDataLevel: mdst
    runNumber: 0

To produce JSON files for a list of combined skims, pass this tool the YAML file and the names of the skims. The combined skims must be defined in skim.registry.combined_skims. The other required arguments include the skim campaign, intended release to be used, and base directory of the repository to output the JSON files in.

This tool is designed to work with the SkimStats.json output of b2skim-stats-print (see Testing skim performance). The YAML files can be used to specify which sample statistics are to be used for each dataset, with the keyword sampleLabel—this must match one of the sample labels used by the skim statistics tools. SkimStats.json must be present in the current directory when this tool is run.

Production requests cannot be produced without resource usage estimates, so the pipeline for producing a production request and combined steering file is as follows:

Put together a YAML file defining which single skims comprise each combined skim. For example,

# contents of CombinedSkims.yaml
EWP:
  - BtoXll
  - BtoXll_LFV
  - BtoXgamma
Tau:
  - TauLFV
  - TauGeneric
  - TauThrust

Pass this combined skim definition to b2skim-stats-submit, and produce JSON output of b2skim-stats-print

$ b2skim-stats-submit -c CombinedSkims.yaml EWP Tau
# wait for LSF jobs to complete...
$ b2skim-stats-print -c EWP Tau -J

The output SkimStats.json can then be used to produce production JSON files for the EWP and Tau combined skims, and will construct a steering files for the specified combined skims.

usage: b2skim-prod [-h] -s CombinedSkim [CombinedSkim ...] -o
                   OUTPUT_BASE_DIRECTORY -c CAMPAIGN -r RELEASE
                   [-l LOCAL_SKIM_SCRIPT] [-N LPNS_PER_JSON]
                   [-b STARTINGBATCHNUMBER] (--mc | --data)
                   sampleRegistryYaml SkimStatsJson

Required Arguments

sampleRegistryYaml: YAML file defining the samples produce JSON files for.
SkimStatsJson: The JSON output file of b2skim-stats-print.

Optional Arguments

-s, --skims

List of skims to produce request files for. Only accepts combined skims listed in skim.registry.combined_skims.

-o, --output-base-directory

Base directory for output. This should be the base directory of the B2P/MC or B2P/data repo.

-c, --campaign

Name of the campaign, e.g. SKIMDATAx1.

-r, --release

The basf2 to release to be used, e.g. release-04-00-03.

-l, --local-skim-script

File name of the local skim script to use, if any. e.g. ewp_local.py. Should not include any path before the file name.

-N, --lpns-per-json

Restrict number of LPNs in each JSON file to given number.

-b, --starting-batch-number

Starting number to count from for batch label appended to prod names.

Default: 1

--mc, --MC

Produce JSON files for MC.

--data, --Data

Produce JSON files for data.

Example usage

Produce requests for EWP and feiSLCombined skims on proc9:

$ b2skim-prod Registry_proc9.yaml SkimStats.json -s EWP feiSLCombined --data -c SKIMDATAx1 -r release-04-01-01 -o B2P/data/

Produce requests for EWP on MC13, with one LPN per JSON file:

$ b2skim-prod Registry_MC13.yaml SkimStats.json -N 1 -s EWP --data -c SKIMDATAx1 -r release-04-01-01 -o B2P/data/

Produce requests for EWP on MC13, using local skim module script:

$ b2skim-prod Registry_MC13.yaml SkimStats.json -N 1 -l ewp_local.py -s EWP --data -c SKIMDATAx1 -r release-04-01-01 -o B2P/data/

17.4.9. `lpns2yaml.py`: Convert lists of LPNs to format expected by `b2skim-prod`¶

lpns2yaml.py is a tool for converting a list of LPNs into YAML format expected by b2skim-prod. The expected input to lpns2yaml.py is a text file of LPNs, like those which can be downloaded from the dataset searcher.

The test sample labels (under the key sampleLabel) are automatically generated, so please check they all correspond to a label skim/scripts/TestFiles.yaml after running the script.

usage: lpns2yaml.py [-h] [-o output_filename] (--data | --mc)
                    [--bg {BGx0,BGx1}]
                    input_lpn_list_file

Required Arguments

input_lpn_list_file: Input file containing list of LPNs (such as that from the dataset searcher).

Optional Arguments

-o

Output YAML file name. If none given, prints output to screen.

--data

Flag to indicate the LPNs are for data.

--mc

Flag to indicate the LPNs are for MC.

--bg

Possible choices: BGx0, BGx1

Beam background level of MC samples. Only required for MC.

Example usage

Convert list of BGx1 MC LPNs into YAML format and print to screen:
```
$ lpns2yaml.py my_MC_LPNs.txt --mc --bg BGx1
```

Convert list of data LPNs into YAML format and save to file:

$ lpns2yaml.py my_data_LPNs.txt --data -o my_data_LPNs.yaml

17.4.1. Writing a skim¶

17.4.2. Building skim lists in a steering file¶

Skim flags¶

17.4.3. Running a skim¶

b2skim-run: Run a skim¶

Sub-commands:¶

single¶

combined¶

module¶

b2skim-generate: Generate skim steering files¶

17.4.4. Skim tutorial¶

17.4.5. Skim registry¶

17.4.6. Testing skim performance¶

b2skim-stats-submit: Run skim scripts on test samples¶

b2skim-stats-print: Print tables of performance statistics¶

17.4.7. Utility functions for skim experts¶

17.4.8. b2skim-prod: Produce grid production requests¶

17.4.9. lpns2yaml.py: Convert lists of LPNs to format expected by b2skim-prod¶

`b2skim-run`: Run a skim¶

`b2skim-generate`: Generate skim steering files¶

`b2skim-stats-submit`: Run skim scripts on test samples¶

`b2skim-stats-print`: Print tables of performance statistics¶

17.4.8. `b2skim-prod`: Produce grid production requests¶

17.4.9. `lpns2yaml.py`: Convert lists of LPNs to format expected by `b2skim-prod`¶