3.4.8. Generating Monte Carlo#

Why do we need Monte Carlo simulated data?#

Monte Carlo (MC) data are widely used in High Energy Physics. MC simulations allow us to mimic what happens in the Belle II detector after the collisions to understand the experimental conditions and performance. MC data are processed as real data in both reconstruction and physics analyses, with the advantage that we know the “truth” (e.g. if a track observed in the CDC was originated by a muon or a pion). The more realistic the MC simulation, the better we can interpret the results obtained by analyzing collisions events.

Usually we need large MC samples (larger than the collisions dataset we want to use for the physics analysis) to avoid spurious fluctuations and to reduce uncertainties driven by the size of the MC sample. However, the MC samples that we need for our analysis don’t have to be generated by ourselves: this is one of the duty of the Data Production group. Each physics Working Group has a dedicated person (the Data Production liason) to assist the analysers and to collect their requests concerning the MC production.

There are several reasons for discouraging the self-production of very large MC samples:

our own-generated MC sample may use an inconsistent version of the software (or conditions database, beam background, generator settings, etc.) with respect to the ones produced centrally and used by our colleagues;
we do not have the computing resources to produce large MC samples; the Data Production group uses the grid to properly handle a MC production campaign, while we can not; moreover, using a batch submission system for a private large production is not a good idea, because it takes a lot of time and, more important, we will subtract important computing resources to our colleaguers and to their jobs;
we do not have the space to store a large number of MC samples.

Question

How much disk space would you need to store mDST files equivalent to \(500\;\text{fb}^{-1}\) of generic \(B^0 \bar{B}^0\) events? You can assume an mDST size for a single \(B^0 \bar{B}^0\) event of \(10\;\text{kB}\).

Hint

The only other input we need is the cross section for for the production of \(B^0 \bar{B}^0\) events.

Solution

Knowing that the cross section for the production of \(B^0 \bar{B}^0\) events at Belle II is \(0.540\;\text{nb}\), we would need about \(2.55\;\text{TB}\) for storing such a sample.

However, we may need to produce a small MC sample by our-self: for example, we want to test a new feature of the detector simulation, or we want to test the efficiency of a new trigger line, and so on.

Warning

Don’t generate larger samples yourself, contact the Data Production liason of your physics Working Group if you need a large production of MC signal samples!

Main steps for the Monte Carlo production#

The production of a MC sample can be divided in three main steps:

the generation of the MC particles using a MC generator; the currently used generators have not been developed within Belle II but have been inherited from the HEP community;
the simulation of the detector response and of the L1 trigger; the simulation of the interaction of the MC particles with the matter is handled by Geant4, while the digitization of the simulated hits is handled by subdetector modules;
the reconstruction of tracks, electromagnetic clusters, etc.; the same modules and algorithms used at this step are used also for reconstructing collisions events.

Warning

We can produce MC samples only by using a full release, and not a light one. In a light release we do not have the generators, simulation and reconstruction packages, meaning that running with a light release a steering file intended for some MC production will result in a basf2 crash.

How to write a steering file for the Monte Carlo production#

Of course, for the production of a MC sample we have to write basf2 a steering file. Let’s see how to write it.

As usual, one has to start by importing basf2 and creating a main path.

import basf2 as b2
main = b2.Path()

Then, the first module to be added to the main path must be EventInfoSetter. This module is necessary for correctly setting some of the production parameters, like the number of events to be produced or the experiment number.

Question

You want to generate 10 events using the full Belle II geometry (experiment 0). Which configuration do you have to use for EventInfoSetter?

Hint

Check the documentation (EventInfoSetter) to identify the module’s parameters and their names.

Solution

You have to add to your steering path the following line:

main.add_module('EventInfoSetter', evtNumList=[10], expList=[0])

Alternatively, you can just add the module to the steering path without setting the module’s parameters and you can set the number of events and the experiment number via command line using the basf2 arguments:

basf2 my_steering_file.py -n 10 --experiment 0

Tip

You can also use b2help-modules: List available basf2 modules and ask about their details to look at a module’s parameters. For example, you can run on your shell b2help-modules EventInfoSetter.

Now we have to add to the steering path the relevant functions for:

generating MC events (for example, some \(B^0 \bar{B}^0\) events): generators.add_evtgen_generator()
simulating the detector response, including the L1 trigger: simulation.add_simulation()
reconstructing the relevant objects, like tracks, electromagnetic clusters, etc.: reconstruction.add_reconstruction()
storing the reconstructed objects into an output mDST file for further studies: mdst.add_mdst_output()

Question

Can you figure out how to add all the previous functions to your steering file in order to produce a small sample of \(B^0 \bar{B}^0\) MC events?

Hint

Do not forget to import all the relevant python modules (for example, import simulation for add_simulation()).

Solution

#!/usr/bin/env python3

import basf2 as b2
import generators as ge
import simulation as si
import reconstruction as re
import mdst

# Create the steering path
main = b2.Path()

# Define number of events and experiment number
main.add_module('EventInfoSetter', evtNumList=[10], expList=[0])

# Generate B0B0bar events
ge.add_evtgen_generator(path=main, finalstate='mixed')

# Simulate the detector response and the L1 trigger
si.add_simulation(path=main)

# Reconstruct the objects
re.add_reconstruction(path=main)

# Create the mDST output file
mdst.add_mdst_output(path=main, filename='my_mdst_output.root')

# Process the steering path
b2.process(path=main)

# Finally, print out some statistics about the modules execution
print(b2.statistics)

Tip

To monitor the status of the script execution, it is advisable to add to the steering path the Progress module. It periodically prints a message on the screen about the number of processed events. You can add it wherever you like between main.add_module('EventInfoSetter') and b2.process(path=main).

Warning

Do not try to generate a large sample: it will take a lot of time! For the exercise, limit the number of generated events to 10. If you want to produce a larger sample (e.g. 1000 events), consider to use a batch submission system (whose usage will be explained in next lessons: Batch submission for the KEKCC system and htcondor).

Warning

Even if the produced mDST file is a .root file, you can not open it like you did in previous lessons using pandas, because it contains basf2 objects and it is not designed to be properly opened with different softwares. In case you are really curious to know which objects are stored there, you can run b2file-size my_mdst_output.root in your shell.

Question

How can we perform a physics analysis on the output mDST file of the previous exercise?

Solution

This is very simple: write a steering file similar to the one you wrote in the First steering file lesson.

Tip

If you need more complex steering files for your quick studies or you want to use different MC generators, you can always check our MC production repository (used and maintained by the Data Production group). This repository is useful also to check which configurations were used for a specific MC production campaign.

How to write a decay file and to generate a signal Monte Carlo sample#

In the last exercise we learnt how to generate a generic \(B^0 \bar{B}^0\) event. Probably you are wondering where the decay channels of the \(B^0\) / \(\bar{B}^0\) and the respective branching fractions are defined, or if it possible to specify one or few specific decay channels for the generated particles.

All the known decay channels of the generated \(B^0\) and \(\bar{B}^0\) mesons are defined in our main decay file. This file is automatically loaded by the function generators.add_evtgen_generator() and it is then passed to the EvtGen generator.

The main decay file is very long and probably confusing for a newcomer. In order to correctly read and understand it, let us now move to a different topic: how can we generate events specifying the decay channels of our interest? In our internal jargon, this is called “generating signal MC events”.

Suppose that we want to study the \(B^0 \to J/\Psi K^0_S\) decay, with \(J/\Psi \to \mu^+ \mu^-\) and \(K^0_S \to \pi^+ \pi^-\). A decay file that we can use is the following one (copy it into a file named my_decay_file.dec or something similar):

# This is my signal decay file.

Decay Upsilon(4S)
1.0 B0 anti-B0  VSS_BMIX dm;
Enddecay

Decay B0
1.0 J/psi K_S0 PHSP;
Enddecay
CDecay anti-B0

Decay J/psi
1.0 mu+ mu- VLL;
Enddecay

Decay K_S0
1.0 pi+ pi- PHSP;
Enddecay

End

Looking at this simple decay file, it is not difficult to understand its structure:

the decay of a specific particle X starts with Decay X and ends with Enddecay;
the decay channels are then defined by the lines starting with numbers and each line defining a decay channel must end with a ;;
a decay channel is fully identified by: its branching fraction (1.0 in all the cases above), its final state (for the B0 case it is J/psi K_S0) and the EvtGen decay model that we want to use for that specific final state; in this case we used several models: VSS_BMIX dm is a model for vector particles decaying into two scalars in which the \(B^0\) / \(\bar{B}^0\) mesons mixing is turned on, VLL is a model for vector particles decaying into two leptons, while PHSP is a simple phase space decay model.

in case you want to use for a particle the same final states (assuming charge-conjugation) of its charge-conjucated partner, you have to specify it using the CDecay identifier, followed by the charge-conjucated state;
the decay file must end with End;
each line starting with a # is interpreted as a comment by EvtGen and it is thus ignored.

Warning

This particular decay file will produce two signal particles per event. Usually, for a signal sample we want to generate one “signal particle” (namely: one particle decaying following exactly your signal decay file) and one “generic particle” (one particle whose decay is instead driven by the main decay file). This can be achieved by using “particle aliases” in the decay file. This is documented in the EvtGen generator website.

Warning

It may be complicated figuring out which decay model you have to use for a specific decay channel. You can always refer to our main decay file or, even better, you can contact the Data Production liason of your physics working group.

Probably you are also wondering where all the particle names are defined (while for B0 and mu+ the naming scheme is quite obvious, this is not the same for K_S0). These names are defined in the evt.pdl file, but luckily you do not have to go through it, because we have a dedicated tool that parses the file and allows you to easily get all the relevant information: b2help-particles: List particles used in the event generator.

Question

Before moving on… how many particles having a mass between \(3.0\;\text{GeV/c}^2\) and \(3.6\;\text{GeV/c}^2\) are defined in the evt.pdl file?

Hint

Can you use b2help-particles: List particles used in the event generator for this?

Solution

Running b2help-particles --min 3.0 --max 3.6 shows that there are 7 particles in this particular mass range: \(J/\Psi\), \(\Xi_{cc}^+\), \(\bar{\Xi}_{cc}^-\) \(\chi_{c0}\), \(\chi_{c1}\), \(\chi_{c2}\) and \(h_c\).

Question

Which line of the previous steering file has to be modified in order to generate our signal sample? And how does it have to be modified? Run the resulting steering file again.

Solution

#!/usr/bin/env python3

import basf2 as b2
import generators as ge
import simulation as si
import reconstruction as re
import mdst

# Create the steering path
main = b2.Path()

# Define number of events and experiment number
main.add_module('EventInfoSetter', evtNumList=[10], expList=[0])

# Generate B0B0bar events
ge.add_evtgen_generator(
    path=main,
    finalstate='signal',
    signaldecfile=b2.find_file('my_decay_file.dec')
)

# Simulate the detector response and the L1 trigger
si.add_simulation(path=main)

# Reconstruct the objects
re.add_reconstruction(path=main)

# Create the mDST output file
mdst.add_mdst_output(path=main, filename='my_mdst_output.root')

# Process the steering path
b2.process(path=main)

# Finally, print out some statistics about the modules execution
print(b2.statistics)

Question

Suppose that now you are interested to a different final state: \(B^0 \to \phi K^0_S\), with: \(\text{BF}(\phi \to K^+ K^-) = 1\), \(\text{BF}(K^0_S \to \pi^+ \pi^-) = 0.7\) and \(\text{BF}(K^0_S \to \pi^0 \pi^0) = 0.3\). Which decay file would you write?

Hint

Use b2help-particles: List particles used in the event generator to identify the names to be used for \(\phi\), \(K^+\) and \(\pi^0\). Probably you want to check their PDG codes if you do not remember them…

Solution

Decay Upsilon(4S)
1.0 B0 anti-B0  VSS_BMIX dm;
Enddecay

Decay B0
1.0 phi K_S0 PHSP;
Enddecay
CDecay anti-B0

Decay phi
1.0 K+ K- VSS;
Enddecay

Decay K_S0
0.7 pi+ pi- PHSP;
0.3 pi0 pi0 PHSP;
Enddecay

End

Key points

Producing MC samples by ourselves should be used only for running quick tests or debugging.
Producing MC signal samples is not a trivial task, because the correct branching fractions and decay models must be used for all the final states under study. Contact the contact the Data Production liason of your physics Working Group for producing large MC signal samples in the correct way.

Stuck? We can help!

If you get stuck or have any questions to the online book material, the #starterkit-workshop channel in our chat is full of nice people who will provide fast help.

Refer to Collaborative Tools. for other places to get help if you have specific or detailed questions about your own analysis.

Improving things!

If you know how to do it, we recommend you to report bugs and other requests with GitLab. Make sure to use the documentation-training label of the basf2 project.

If you just want to give very quick feedback, use the last box “Quick feedback”.

Please make sure to be as precise as possible to make it easier for us to fix things! So for example:

typos (where?)
missing bits of information (what?)
bugs (what did you do? what goes wrong?)
too hard exercises (which one?)
etc.

If you are familiar with git and want to create your first pull request for the software, take a look at How to contribute. We’d be happy to have you on the team!

Quick feedback!

Author of this lesson

Giacomo “the goat” De Pietro

basf2 release-08-02-02 documentation

Generating Monte Carlo

Contents

3.4.8. Generating Monte Carlo#

Why do we need Monte Carlo simulated data?#

Main steps for the Monte Carlo production#

How to write a steering file for the Monte Carlo production#

How to write a decay file and to generate a signal Monte Carlo sample#