3.4.2. First steering file#

In this hands-on tutorial you’ll be writing your first steering file. Our ultimate goal is to reconstruct \(B^0 \to J/\Psi(\to e^+e^-)K_S^0(\to \pi^+\pi^+)\). You’ll be learning step-by-step what is necessary to achieve this, and in the end you will produce a plot of the \(B\) meson candidates. As you have already learned in the previous sections, basf2 provides a large variety of functionality. While the final steering file of this lesson will be working and producing some reasonable output, there are many possible extensions that you will learn all about in the succeeding lessons.

Let’s get started: The very first step is always to set up the necessary environment.

Task

Set up the basf2 environment using the currently recommended software version.

Now let’s get started with your steering file!

Task

Open an empty file with an editor of your choice. Add three lines that do the following:

  • Import the basf2 python library (it might be convenient to set an abbreviation, e.g. b2)

  • Create a basf2.Path (call it main)

  • Process the path with basf2.process

Save the file as myanalysis.py.

Running steering files is as easy as calling basf2 myanalysis.py on the command-line.

Exercise

Run the short script that you just created. Don’t worry, if you’ve done everything correct, you should see some error messages. Read them carefully!

Loading input data#

Of course, no events could be processed so far because no data has been loaded yet. Let’s do it. As already described in the previous lesson almost all convenience functions that are needed can be found in modularAnalysis.

It is recommended to use inputMdstList or inputMdst. If you look at the source code, you’ll notice that the latter actually calls the more general former.

Exercise

How many arguments are required for the inputMdstList function? Which value has to be set for the environment type?

In a later lesson you’ll learn how and where to find input files for your analysis. For the purpose of this tutorial we have prepared some local input files of \(B^0 \to J/\Psi K_S^0\). They should be available in the ${BELLE2_EXAMPLES_DATA_DIR}/starterkit/2021 directory on KEKCC, NAF, and other servers. The files’ names start with the decfile number 1111540100.

Exercise

Check out the location of the files mentioned above. Which two settings of MC are provided?

A helpful function to get common data files from the examples directory is basf2.find_file.

Task

Extend your steering file by loading the data of one of the local input files. It makes sense to run the steering file again.

  • If there is a syntax error in your script or you forgot to include a necessary argument, there will be an error message that should help you to debug and figure out what needs to be fixed.

  • If the script is fine, only three lines with info messages should be printed to the output and you should see a quickly finishing progress bar.

In the solution to the last task we have added empty lines, some comments, and used shortcuts for the imports. This helps to give the script a better structure and allows yourself and others to better understand what’s going on in the steering file. In the very first line we have also added a shebang to define that the steering file should be executed with a python interpreter.

So far, the input file has been completely hard-coded. But as we’ve seen before the file names only differ by the final suffix. We can add a little bit more flexibility by providing this integer as a command-line argument. Then, we can select a different input file when running the steering file without having to change anything in the script itself.

Task

Adjust your steering file so that you can select via an integer as a command-line argument which file is going to be processed.

Tip

Make sure that from now on you always supply a number every time you run your steering file, e.g. basf2 myanalysis.py 1.

Otherwise you will get an exception like this

Traceback (most recent call last):
  File "myanalysis.py", line 3, in <module>
    filenumber = sys.argv[1]
IndexError: list index out of range

Filling particle lists#

The mdst data objects (Tracks, ECLCluster, KLMCluster, V0s) of the input file have to be transferred into Particle data objects. This is done via the ParticleLoader module and its wrapper function (convenience function) fillParticleList.

Exercise

Read the documentation of fillParticleList to familiarize yourself with the required arguments.

Which six final state particles can be created from Tracks?

Internally, the anti-particle lists are always filled as well, so it is not necessary to call fillParticleList for e+ and e-. In fact, you will see a warning message for the second call telling you that the corresponding particle list already exists.

As long as no selection criteria (cuts) are provided, the only difference between loading different charged final state particle types is the mass hypothesis used in the track fit.

Each particle used in the decayString argument of the fillParticleList function can be extended with a label. This is useful to distinguish between multiple lists of the same particle type with different selection criteria, e.g. soft and hard electrons.

ma.fillParticleList("e-:soft", "E < 1", path=main) # the label of this electron list is "soft"
ma.fillParticleList("e-:hard", "E > 3", path=main) # here the label is "hard"

Warning

If the provided cut string is not empty you can not use the label all, i.e. having

ma.fillParticleList("e-:all", "E > 0", path=main)

in your steering file will cause a fatal error and stop the execution of your script.

There are standard particle lists with predefined selection criteria. While those for charged final state particles should only be used in the early stages of your analysis and be replaced with dedicated selections adjusted to the needs of the decay mode you are studying, it is recommended to use them for V0s (\(K_S^0\), \(\Lambda^0\)). They are part of the library stdV0s.

Exercise

Find the documentation of the convenience function that creates the standard \(K_S^0\) particle list. What is the name of the particle list generated by this function?

Task

Extend your steering file by loading electrons, positrons, and \(K_S^0\) candidates.

Task

Run your steering file and answer the following questions:

  • Which are the mass window boundaries set for the \(K_S^0\)?

In the previous task you should have learned how useful it is to carefully study the output. This is especially relevant if there are warning or error messages. Remember to never ignore them as they usually point to some serious issue, either in the way you have written your steering file or in the basf2 software itself. In the latter case you are encouraged to report the problem so that it can be fixed by some experts (maybe you yourself will become this expert one day).

In order to purify a sample it makes sense to apply at least loose selection criteria. This can be based on the particle identification (e.g. electronID for electrons and positrons), requiring the tracks to originate from close to the interaction point (by using dr and dz), and having a polar angle in the acceptance of the CDC (thetaInCDCAcceptance).

Exercise

Find out what’s the difference between dr and dz, e.g. why do we not have to explicitly ask for the absolute value of dr? What’s the angular range of the CDC acceptance (as implemented in the software)?

Task

Apply a cut on the electron particle list, requiring an electron ID greater than 0.1, a maximal transverse distance to the IP of 0.5 cm, a maximal distance in z-direction to the IP of 2 cm, and the track to be inside the CDC acceptance.

Combining particles#

Now we have a steering file in which final state particles are loaded from the input mdst file to particle lists. One of the most powerful modules of the analysis software is the ParticleCombiner. It takes those particle lists and finds all unique combinations. The same particle can of course not be used twice, e.g. the two positive pions in \(D^0 \to K^- \pi^+ \pi^+ \pi^-\) have to be different mdst track objects. However, all of this is taken care of internally. For multi-body decays like the one described above, there can easily be many multiple candidates which share some particles but differ by at least one final state particle.

The wrapper function for the ParticleCombiner is called reconstructDecay. Its first argument is a DecayString, which is a combination of a mother particle (list), an arrow, and daughter particles. The DecayString has its own grammar with several markers, keywords, and arrow types. It is especially useful for inclusive reconstructions (reconstructions in which only part of the decay products are specified, e.g. only requiring charged leptons in the final state; the opposite would be exclusive reconstructions). Follow the provided link if you want to learn more about the DecayString. For the purpose of this tutorial, we do not need any of those fancy extensions as the default arrow type -> suffices. However, it is important to know how the particles themselves need to be written in the decay string.

Exercise

How do we have to type a \(J/\Psi\), and what is its nominal mass?

Task

Extend the steering file by first forming \(J/\Psi\) candidates from electron-positron combinations, and then combining them with a \(K_S^0\) to form \(B^0\) candidates.

Include a abs(dM) < 0.11 cut for the \(J/\Psi\).

Writing out information to an ntuple#

To separate signal from background events, and to extract physics parameters, an offline analysis has to be performed. The final step of the steering file is to write out information in a so called ntuple using variablesToNtuple. It can contain one entry per candidate or one entry per event.

Exercise

How do you switch between the two ntuple modes?

Warning

Only variables declared as Eventbased are allowed in the event mode. Conversely, both candidate and event-based variables are allowed in the candidate mode.

A good variable to start with is the beam-constrained mass Mbc, which is defined as

\[\text{M}_{\rm bc} = \sqrt{E_{\rm beam}^2 - \mathbf{p}_{B}^2}\]

For correctly reconstructed \(B\) mesons this variable should peak at the \(B\) meson mass.

Task

Add some code that saves the beam-constrained \(B\) mass of each \(B\) candidate in an output ntuple. Then, run your steering file.

Although you are analyzing a signal MC sample, the reconstruction will find many candidates that are actually not signal, but random combinations that happen to fulfill all your selection criteria.

Task

Write a short python script in which you load the root ntuple from the previous exercise into a dataframe and then plot the distribution of the beam-constrained mass using a histogram with 100 bins between the range 4.3 to 5.3 GeV/c 2.

Can you identify the signal and background components?

Adding MC information#

For the beam-constrained mass we know pretty well how the signal distribution should look like. But what’s the resolution and how much background actually extends under the signal peak? With MC, we have the advantage that we know what has been generated. Therefore, we can add a flag to every candidate to classify it as signal or background. Furthermore, we can study our background sources if we know what the reconstruction has falsely identified.

There is a long chapter on MC matching in the documentation. You should definitely read it to understand at least the basics.

Exercise

Which module do you have to run to get the relations between the reconstructed and the generated particles? How often do you have to call the corresponding function?

Task

Add MC matching for all particles of the decay chain, and save information on whether the reconstructed \(B\) meson is a signal candidate to the ntuple. Run the steering file again.

Task

Plot the beam-constrained mass but this time use the signal flag to visualize which component is signal and which is background.

As you could see, it makes sense to cut on Mbc from below. A complementary variable that can be used to cut away background is \(\Delta E\) (deltaE).

Exercise

When combining your \(J/\Psi\) with your \(K_S^0\), introduce a cut \(\text{M}_{\rm bc} > 5.2\) and \(|\Delta E|<0.15\).

Variable collections#

While the MC matching allows us to separate signal from background and study their shapes, we need to use other variables to achieve the same on collision data. Initially, it makes sense to look at many different variables and try to find those with discriminating power between signal and background. The most basic information are the kinematic properties like the energy and the momentum (and its components). In basf2, collections of variables for several topics are pre-prepared. You can find the information in the Collections and Lists section of the documentation.

Exercise

Find out which variable collections contain the variables we have added to the ntuple so far.

Task

Save all the kinematic information, both the truth and the reconstructed values, of the \(B\) meson to the ntuple. Also, use the variable collections from the last exercise to replace the individual list.

Hint

If you have trouble understanding what we are doing with the b_vars list, simply add a couple of print(b_vars) between the definition and the operations on it. You might also want to take another look at your understanding of lists.

Variable aliases#

Apart from variables for the mother \(B\) meson, we are also interested in the information of the other daughter and granddaughter variables. You can access them via the daughter meta variable, which takes an integer and a variable name as input arguments. The integer (0-based) counts through the daughter particles, e.g. daughter(0, p) would be the momentum of the first daughter, in our case of the \(J/\Psi\). This function can also be used recursively.

Exercise

What does daughter(0, daughter(0, E)) denote?

In principle, one can add these nested variables directly to the ntuple, but the brackets have to be escaped (i.e. replaced with “normal” characters), and the resulting variable name in the ntuple is not very user-friendly or intuitive. For example daughter(0, daughter(0, E)) becomes daughter__bo0__cm__spdaughter__bo0__cm__spE__bc__bc. Not exactly pretty, right?

So instead, let’s define aliases to translate the variable names! This can be done with addAlias.

Exercise

How can you replace daughter(0, daughter(0, E)) with ep_E?

However, this can quickly fill up many, many lines. Therefore, there are utils to easily create aliases. The most useful is probably create_aliases_for_selected. It lets you select particles from a decay string via the ^ operator for which you want to define aliases, and also set a prefix. Another utility is create_aliases, which is particularly useful to wrap a list of variables in another meta-variable like useCMSFrame or matchedMC.

Task

Add PID and track variables for all charged final state particles and the invariant mass of the intermediate resonances to the ntuple. Also, add the standard variables from before for all particles in the decay chain, and include the kinematics both in the lab and the CMS frame.

Hint

To get more information about the aliases we are creating, simply use VariableManager.printAliases (vm.printAliases()) just before processing your path.

See also

Some more example steering files that center around the VariableManager can be found on GitLab.

Exercise

Run your steering file one last time to check that it works!

Key points

  • The modularAnalysis module contains most of what you’ll need for now

  • inputMdstList is used to load data

  • fillParticleList adds particles into a list

  • reconstructDecay combined FSPs from different lists to “reconstruct” particles

  • matchMCTruth matches MC

  • variablesToNtuple saves an output file

  • Don’t forget process(path) or nothing happens

Authors of this lesson

Frank Meier, Kilian Lieret (minor improvements)