First steering file
Contents
3.4.2. First steering file#
In this hands-on tutorial you’ll be writing your first steering file. Our
ultimate goal is to reconstruct
Let’s get started: The very first step is always to set up the necessary environment.
Task
Set up the basf2 environment using the currently recommended software version.
Hint
Solution
Now let’s get started with your steering file!
Task
Open an empty file with an editor of your choice. Add three lines that do the following:
Import the
basf2
python library (it might be convenient to set an abbreviation, e.g.b2
)Create a
basf2.Path
(call itmain
)Process the path with
basf2.process
Save the file as myanalysis.py
.
Hint
Solution
Running steering files is as easy as calling basf2 myanalysis.py
on the
command-line.
Exercise
Run the short script that you just created. Don’t worry, if you’ve done everything correct, you should see some error messages. Read them carefully!
Solution
Loading input data#
Of course, no events could be processed so far because no data has been loaded
yet. Let’s do it. As already described in the previous lesson almost all
convenience functions that are needed can be found in modularAnalysis
.
It is recommended to use inputMdstList
or inputMdst
. If you look at the
source code, you’ll notice that the latter actually calls the more general
former.
Exercise
How many arguments are required for the inputMdstList
function?
Which value has to be set for the environment type?
Hint
Solution
In a later lesson you’ll learn how and where to find input files for your
analysis. For the purpose of this tutorial we have prepared some local input
files of ${BELLE2_EXAMPLES_DATA_DIR}/starterkit/2021
directory on KEKCC, NAF, and
other servers. The files’ names start with the decfile number 1111540100.
If you’re working from an institute server
If you’re working on your own machine
Exercise
Check out the location of the files mentioned above. Which two settings of MC are provided?
Hint
Solution
A helpful function to get common data files from the examples directory is
basf2.find_file
.
Task
Extend your steering file by loading the data of one of the local input files. It makes sense to run the steering file again.
If there is a syntax error in your script or you forgot to include a necessary argument, there will be an error message that should help you to debug and figure out what needs to be fixed.
If the script is fine, only three lines with info messages should be printed to the output and you should see a quickly finishing progress bar.
Hint
Solution
In the solution to the last task we have added empty lines, some comments, and used shortcuts for the imports. This helps to give the script a better structure and allows yourself and others to better understand what’s going on in the steering file. In the very first line we have also added a shebang to define that the steering file should be executed with a python interpreter.
So far, the input file has been completely hard-coded. But as we’ve seen before the file names only differ by the final suffix. We can be a little bit more flexibility by providing this integer as a command-line argument. Then, we can select a different input file when running the steering file, and without having to change anything in the script itself.
Task
Adjust your steering file so that you can select via an integer as a command-line argument which file is going to be processed.
Hint
Hint
Hint
Solution
Tip
Make sure that from now on you always supply a number every time you run your
steering file, e.g. basf2 myanalysis.py 1
.
Otherwise you will get an exception like this
Traceback (most recent call last):
File "myanalysis.py", line 3, in <module>
filenumber = sys.argv[1]
IndexError: list index out of range
Filling particle lists#
The mdst data objects (Tracks, ECLCluster, KLMCluster, V0s) of the input file
have to be transferred into Particle data objects. This is done via the
ParticleLoader
module and its wrapper function (convenience function) fillParticleList
.
Exercise
Read the documentation of fillParticleList
to familiarize yourself with
the required arguments.
Which six final state particles can be created from Tracks?
Solution
Internally, the anti-particle lists are always filled as well, so it is not
necessary to call fillParticleList
for e+
and e-
. In fact, you will
see a warning message for the second call telling you that the corresponding
particle list already exists.
As long as no selection criteria (cuts) are provided, the only difference between loading different charged final state particle types is the mass hypothesis used in the track fit.
Each particle used in the decayString
argument of the fillParticleList
function can be extended with a label. This is useful to distinguish between
multiple lists of the same particle type with different selection criteria,
e.g. soft and hard electrons.
ma.fillParticleList("e-:soft", "E < 1", path=main) # the label of this electron list is "soft"
ma.fillParticleList("e-:hard", "E > 3", path=main) # here the label is "hard"
Warning
If the provided cut string is not empty you can not use the label
all
, i.e. having
ma.fillParticleList("e-:all", "E > 0", path=main)
in your steering file will cause a fatal error and stop the execution of your script.
There are standard particle lists with predefined selection criteria. While
those for charged final state particles should only be used in the early stages of
your analysis and be replaced with dedicated selections adjusted to the needs
of the decay mode you are studying, it is recommended to use them for V0s
(stdV0s
.
Exercise
Find the documentation of the convenience function that creates the
standard
Hint
Solution
Task
Extend your steering file by loading electrons, positrons, and statistics
.
Hint
Hint
Solution
Task
Run your steering file and answer the following questions:
Which are the mass window boundaries set for the
?Which module had the longest execution time?
Hint
Solution
In the previous task you should have learned how useful it is to carefully study the output. This is especially relevant if there are warning or error messages. Remember to never ignore them as they usually point to some serious issue, either in the way you have written your steering file or in the basf2 software itself. In the latter case you are encouraged to report the problem so that it can be fixed by some experts (maybe you yourself will become this expert one day).
In order to purify a sample it makes sense to apply at least loose selection
criteria. This can be based on the particle identification (e.g. electronID
for electrons and positrons), requiring the tracks to originate from close to the
interaction point (by using dr
and dz
), and having a polar angle in the acceptance
of the CDC (thetaInCDCAcceptance
).
Exercise
Find out what’s the difference between dr
and dz
, e.g. why do we
not have to explicitly ask for the absolute value of dr? What’s the angular
range of the CDC acceptance (as implemented in the software)?
Hint
Solution
Task
Apply a cut on the electron particle list, requiring an electron ID greater than 0.1, a maximal transverse distance to the IP of 0.5 cm, a maximal distance in z-direction to the IP of 2 cm, and the track to be inside the CDC acceptance.
Hint
Solution
Note
Combining particles#
Now we have a steering file in which final state particles are loaded from the
input mdst file to particle lists. One of the most powerful modules of the
analysis software is the ParticleCombiner
. It takes those particle lists and
finds all unique combinations. The same particle can of course not be used
twice, e.g. the two positive pions in
The wrapper function for the ParticleCombiner
is
called reconstructDecay
. Its first argument is a DecayString, which is a
combination of a mother particle (list), an arrow, and daughter particles. The
DecayString has its own grammar with several markers, keywords, and arrow
types. It is especially useful for inclusive reconstructions (reconstructions
in which only part of the decay products are specified, e.g. only requiring
charged leptons in the final state; the opposite would be exclusive reconstructions). Follow the
provided link if you want to learn more about the DecayString. For the
purpose of this tutorial, we do not need any of those fancy extensions as the
default arrow type ->
suffices. However, it is important to know how the
particles themselves need to be written in the decay string.
Exercise
How do we have to type a
Hint
Solution
Task
Extend the steering file by first forming
Include a abs(dM) < 0.11
cut for the
Hint
Hint
Solution
Writing out information to an ntuple#
To separate signal from background events, and to extract physics parameters, an offline
analysis has to be performed. The final step of the steering file is to write
out information in a so called ntuple using variablesToNtuple
. It can
contain one entry per candidate or one entry per event.
Exercise
How do you switch between the two ntuple modes?
Hint
Solution
Warning
Only variables declared as Eventbased
are allowed in the event mode.
Conversely, both candidate and event-based variables are allowed in the
candidate mode.
A good variable to start with is the beam-constrained mass Mbc
, which is defined
as
For correctly reconstructed
Task
Add some code that saves the beam-constrained
Hint
Solution
Although you are analyzing a signal MC sample, the reconstruction will find many candidates that are actually not signal, but random combinations that happen to fulfill all your selection criteria.
Task
Write a short python script in which you load the root ntuple from the previous exercise into a dataframe and then plot the distribution of the beam-constrained mass using a histogram with 100 bins between the range 4.3 to 5.3 GeV/c 2.
Can you identify the signal and background components?
Hint
Hint
Solution
Adding MC information#
For the beam-constrained mass we know pretty well how the signal distribution should look like. But what’s the resolution and how much background actually extends under the signal peak? With MC, we have the advantage that we know what has been generated. Therefore, we can add a flag to every candidate to classify it as signal or background. Furthermore, we can study our background sources if we know what the reconstruction has falsely identified.
There is a long chapter on MC matching in the documentation. You should definitely read it to understand at least the basics.
Exercise
Which module do you have to run to get the relations between the reconstructed and the generated particles? How often do you have to call the corresponding function?
Hint
Solution
Task
Add MC matching for all particles of the decay chain, and save
information on whether the reconstructed
Hint
Hint
Solution
Task
Plot the beam-constrained mass but this time use the signal flag to visualize which component is signal and which is background.
Hint
Solution
As you could see, it makes sense to cut on Mbc
from below. A complementary
variable that can be used to cut away background is deltaE
).
Exercise
When combining your
Hint
Solution
Variable collections#
While the MC matching allows us to separate signal from background and study their shapes, we need to use other variables to achieve the same on collision data. Initially, it makes sense to look at many different variables and try to find those with discriminating power between signal and background. The most basic information are the kinematic properties like the energy and the momentum (and its components). In basf2, collections of variables for several topics are pre-prepared. You can find the information in the Collections and Lists section of the documentation.
Exercise
Find out which variable collections contain the variables we have added to the ntuple so far.
Solution
Task
Save all the kinematic information, both the truth and the reconstructed
values, of the
Hint
Solution
Hint
If you have trouble understanding what we are doing with the b_vars
list, simply add a couple of print(b_vars)
between the definition and
the operations on it. You might also want to take another look at your
understanding of lists.
Variable aliases#
Apart from variables for the mother daughter
meta variable, which takes an integer and a variable
name as input arguments. The integer (0-based) counts through the daughter
particles, e.g. daughter(0, p)
would be the momentum of the first
daughter, in our case of the
Exercise
What does daughter(0, daughter(0, E))
denote?
Solution
In principle, one can add these nested variables directly to the ntuple,
but the brackets have to be escaped (i.e. replaced with “normal” characters),
and the resulting variable name in the ntuple is not very user-friendly or
intuitive. For example daughter(0, daughter(0, E))
becomes
daughter__bo0__cm__spdaughter__bo0__cm__spE__bc__bc
. Not exactly pretty,
right?
So instead, let’s define aliases to translate the variable names!
This can be done with addAlias
.
Exercise
How can you replace daughter(0, daughter(0, E))
with ep_E
?
Hint
Solution
However, this can quickly fill up many, many lines. Therefore, there are utils
to easily create aliases. The most useful is probably
create_aliases_for_selected
. It lets you select particles from a decay
string via the ^
operator for which you want to define aliases, and also
set a prefix. Another utility is create_aliases
, which is particularly
useful to wrap a list of variables in another meta-variable like useCMSFrame
or matchedMC
.
Task
Add PID and track variables for all charged final state particles and the invariant mass of the intermediate resonances to the ntuple. Also, add the standard variables from before for all particles in the decay chain, and include the kinematics both in the lab and the CMS frame.
Hint: Where to look
Hint: Partial solution for final state particles
Hint: CMS variables
Hint: Partial solution for the CMS variables
Solution
Hint
To get more information about the aliases we are creating, simply use
VariableManager.printAliases
(vm.printAliases()
) just before
processing your path.
See also
Some more example steering files that center around the VariableManager
can be found on GitLab.
Exercise
Run your steering file one last time to check that it works!
Key points
The
modularAnalysis
module contains most of what you’ll need for nowinputMdstList
is used to load datafillParticleList
adds particles into a listreconstructDecay
combined FSPs from different lists to “reconstruct” particlesmatchMCTruth
matches MCvariablesToNtuple
saves an output fileDon’t forget
process(path)
or nothing happens
Stuck? We can help!
Improving things!
Quick feedback!
Authors of this lesson
Frank Meier, Kilian Lieret (minor improvements)