3.4.8. Generating Monte Carlo¶
Why do we need Monte Carlo simulated data?¶
Monte Carlo (MC) data are widely used in High Energy Physics. MC simulations allow us to mimic what happens in the Belle II detector after the collisions to understand the experimental conditions and performance. MC data are processed as real data in both reconstruction and physics analyses, with the advantage that we know the “truth” (e.g. if a track observed in the CDC was originated by a muon or a pion). The more realistic the MC simulation, the better we can interpret the results obtained by analyzing collisions events.
Usually we need large MC samples (larger than the collisions dataset we want to use for the physics analysis) to avoid spurious fluctuations and to reduce uncertainties driven by the size of the MC sample. However, the MC samples that we need for our analysis don’t have to be generated by ourselves: this is one of the duty of the Data Production group. Each physics Working Group has a dedicated person (the Data Production liason) to assist the analysers and to collect their requests concerning the MC production.
There are several reasons for discouraging the self-production of very large MC samples:
our own-generated MC sample may use an inconsistent version of the software (or conditions database, beam background, generator settings, etc.) with respect to the ones produced centrally and used by our colleagues;
we do not have the computing resources to produce large MC samples; the Data Production group uses the grid to properly handle a MC production campaign, while we can not; moreover, using a batch submission system for a private large production is not a good idea, because it takes a lot of time and, more important, we will subtract important computing resources to our colleaguers and to their jobs;
we do not have the space to store a large number of MC samples.
Question
How much disk space would you need to store mDST files equivalent to \(500\;\text{fb}^{-1}\) of generic \(B^0 \bar{B}^0\) events? You can assume an mDST size for a single \(B^0 \bar{B}^0\) event of \(10\;\text{kB}\).
Hint
The only other input we need is the cross section for for the production of \(B^0 \bar{B}^0\) events.
Solution
Knowing that the cross section for the production of \(B^0 \bar{B}^0\) events at Belle II is \(0.540\;\text{nb}\), we would need about \(2.55\;\text{TB}\) for storing such a sample.
However, we may need to produce a small MC sample by our-self: for example, we want to test a new feature of the detector simulation, or we want to test the efficiency of a new trigger line, and so on.
Warning
Don’t generate larger samples yourself, contact the Data Production liason of your physics Working Group if you need a large production of MC signal samples!
Main steps for the Monte Carlo production¶
The production of a MC sample can be divided in three main steps:
the generation of the MC particles using a MC generator; the currently used generators have not been developed within Belle II but have been inherited from the HEP community;
the simulation of the detector response and of the L1 trigger; the simulation of the interaction of the MC particles with the matter is handled by Geant4, while the digitization of the simulated hits is handled by subdetector modules;
the reconstruction of tracks, electromagnetic clusters, etc.; the same modules and algorithms used at this step are used also for reconstructing collisions events.
Warning
We can produce MC samples only by using a full release, and not a light one. In a light release we do not have
the generators, simulation and reconstruction packages, meaning that running with a light release a steering file
intended for some MC production will result in a basf2
crash.
How to write a steering file for the Monte Carlo production¶
Of course, for the production of a MC sample we have to write basf2
a steering file. Let’s see how to write it.
As usual, one has to start by importing basf2
and creating a main path.
import basf2 as b2
main = b2.Path()
Then, the first module to be added to the main path must be EventInfoSetter
. This module is necessary for
correctly setting some of the production parameters, like the number of events to be produced or the experiment
number.
Question
You want to generate 10 events using the full Belle II geometry (experiment 0). Which configuration do you
have to use for EventInfoSetter
?
Hint
Check the documentation (EventInfoSetter
) to identify the module’s parameters and their names.
Solution
You have to add to your steering path the following line:
main.add_module('EventInfoSetter', evtNumList=[10], expList=[0])
Alternatively, you can just add the module to the steering path without setting the module’s parameters and
you can set the number of events and the experiment number via command line using the basf2
arguments:
basf2 my_steering_file.py -n 10 --experiment 0
Tip
You can also use b2help-modules: List available basf2 modules and ask about their details to look at a module’s parameters. For example, you can run on your shell
b2help-modules EventInfoSetter
.
Now we have to add to the steering path the relevant functions for:
generating MC events (for example, some \(B^0 \bar{B}^0\) events):
generators.add_evtgen_generator()
simulating the detector response, including the L1 trigger:
simulation.add_simulation()
reconstructing the relevant objects, like tracks, electromagnetic clusters, etc.:
reconstruction.add_reconstruction()
storing the reconstructed objects into an output mDST file for further studies:
mdst.add_mdst_output()
Question
Can you figure out how to add all the previous functions to your steering file in order to produce a small sample of \(B^0 \bar{B}^0\) MC events?
Hint
Do not forget to import all the relevant python modules (for example, import simulation
for add_simulation()
).
Solution
1#!/usr/bin/env python3
2
3import basf2 as b2
4import generators as ge
5import simulation as si
6import reconstruction as re
7import mdst
8
9# Create the steering path
10main = b2.Path()
11
12# Define number of events and experiment number
13main.add_module('EventInfoSetter', evtNumList=[10], expList=[0])
14
15# Generate B0B0bar events
16ge.add_evtgen_generator(path=main, finalstate='mixed')
17
18# Simulate the detector response and the L1 trigger
19si.add_simulation(path=main)
20
21# Reconstruct the objects
22re.add_reconstruction(path=main)
23
24# Create the mDST output file
25mdst.add_mdst_output(path=main, filename='my_mdst_output.root')
26
27# Process the steering path
28b2.process(path=main)
29
30# Finally, print out some statistics about the modules execution
31print(b2.statistics)
Tip
To monitor the status of the script execution, it is advisable to add to the steering path the
Progress
module. It periodically prints a message on the screen about the number of processed events.
You can add it wherever you like between main.add_module('EventInfoSetter')
and b2.process(path=main)
.
Warning
Do not try to generate a large sample: it will take a lot of time! For the exercise, limit the number of generated events to 10. If you want to produce a larger sample (e.g. 1000 events), consider to use a batch submission system (whose usage will be explained in next lessons: Batch submission for the KEKCC system and htcondor).
Warning
Even if the produced mDST file is a .root
file, you can not open it like you did in previous lessons using
pandas
, because it contains basf2
objects and it is not designed to be properly opened with different
softwares. In case you are really curious to know which objects are stored there, you can run
b2file-size my_mdst_output.root
in your shell.
Question
How can we perform a physics analysis on the output mDST file of the previous exercise?
Solution
This is very simple: write a steering file similar to the one you wrote in the First steering file lesson.
Tip
If you need more complex steering files for your quick studies or you want to use different MC generators, you can always check our MC production repository (used and maintained by the Data Production group). This repository is useful also to check which configurations were used for a specific MC production campaign.
How to write a decay file and to generate a signal Monte Carlo sample¶
In the last exercise we learnt how to generate a generic \(B^0 \bar{B}^0\) event. Probably you are wondering where the decay channels of the \(B^0\) / \(\bar{B}^0\) and the respective branching fractions are defined, or if it possible to specify one or few specific decay channels for the generated particles.
All the known decay channels of the generated \(B^0\) and \(\bar{B}^0\) mesons are defined in our
main decay file. This file is automatically loaded by the function generators.add_evtgen_generator()
and
it is then passed to the EvtGen generator.
The main decay file is very long and probably confusing for a newcomer. In order to correctly read and understand it, let us now move to a different topic: how can we generate events specifying the decay channels of our interest? In our internal jargon, this is called “generating signal MC events”.
Suppose that we want to study the \(B^0 \to J/\Psi K^0_S\) decay, with \(J/\Psi \to \mu^+ \mu^-\) and
\(K^0_S \to \pi^+ \pi^-\). A decay file that we can use is the following one (copy it into a file named
my_decay_file.dec
or something similar):
1# This is my signal decay file.
2
3Decay Upsilon(4S)
41.0 B0 anti-B0 VSS_BMIX dm;
5Enddecay
6
7Decay B0
81.0 J/psi K_S0 PHSP;
9Enddecay
10CDecay anti-B0
11
12Decay J/psi
131.0 mu+ mu- VLL;
14Enddecay
15
16Decay K_S0
171.0 pi+ pi- PHSP;
18Enddecay
19
20End
Looking at this simple decay file, it is not difficult to understand its structure:
the decay of a specific particle
X
starts withDecay X
and ends withEnddecay
;the decay channels are then defined by the lines starting with numbers and each line defining a decay channel must end with a
;
;a decay channel is fully identified by: its branching fraction (
1.0
in all the cases above), its final state (for theB0
case it isJ/psi K_S0
) and the EvtGen decay model that we want to use for that specific final state; in this case we used several models:VSS_BMIX dm
is a model for vector particles decaying into two scalars in which the \(B^0\) / \(\bar{B}^0\) mesons mixing is turned on,VLL
is a model for vector particles decaying into two leptons, whilePHSP
is a simple phase space decay model.
in case you want to use for a particle the same final states (assuming charge-conjugation) of its charge-conjucated partner, you have to specify it using the
CDecay
identifier, followed by the charge-conjucated state;the decay file must end with
End
;each line starting with a
#
is interpreted as a comment byEvtGen
and it is thus ignored.
Warning
This particular decay file will produce two signal particles per event. Usually, for a signal sample we want to generate one “signal particle” (namely: one particle decaying following exactly your signal decay file) and one “generic particle” (one particle whose decay is instead driven by the main decay file). This can be achieved by using “particle aliases” in the decay file. This is documented in the EvtGen generator website.
Warning
It may be complicated figuring out which decay model you have to use for a specific decay channel. You can always refer to our main decay file or, even better, you can contact the Data Production liason of your physics working group.
Probably you are also wondering where all the particle names are defined (while for B0
and mu+
the naming
scheme is quite obvious, this is not the same for K_S0
). These names are defined in the evt.pdl file, but
luckily you do not have to go through it, because we have a dedicated tool that parses the file and allows you to
easily get all the relevant information: b2help-particles: List particles used in the event generator.
Question
Before moving on… how many particles having a mass between \(3.0\;\text{GeV/c}^2\) and \(3.6\;\text{GeV/c}^2\) are defined in the evt.pdl file?
Hint
Can you use b2help-particles: List particles used in the event generator for this?
Solution
Running b2help-particles --min 3.0 --max 3.6
shows that there are 7 particles in this particular mass range:
\(J/\Psi\), \(\Xi_{cc}^+\), \(\bar{\Xi}_{cc}^-\) \(\chi_{c0}\), \(\chi_{c1}\),
\(\chi_{c2}\) and \(h_c\).
Question
Which line of the previous steering file has to be modified in order to generate our signal sample? And how does it have to be modified? Run the resulting steering file again.
Solution
1#!/usr/bin/env python3
2
3import basf2 as b2
4import generators as ge
5import simulation as si
6import reconstruction as re
7import mdst
8
9# Create the steering path
10main = b2.Path()
11
12# Define number of events and experiment number
13main.add_module('EventInfoSetter', evtNumList=[10], expList=[0])
14
15# Generate B0B0bar events
16ge.add_evtgen_generator(
17 path=main,
18 finalstate='signal',
19 signaldecfile=b2.find_file('my_decay_file.dec')
20)
21
22# Simulate the detector response and the L1 trigger
23si.add_simulation(path=main)
24
25# Reconstruct the objects
26re.add_reconstruction(path=main)
27
28# Create the mDST output file
29mdst.add_mdst_output(path=main, filename='my_mdst_output.root')
30
31# Process the steering path
32b2.process(path=main)
33
34# Finally, print out some statistics about the modules execution
35print(b2.statistics)
Question
Suppose that now you are interested to a different final state: \(B^0 \to \phi K^0_S\), with: \(\text{BF}(\phi \to K^+ K^-) = 1\), \(\text{BF}(K^0_S \to \pi^+ \pi^-) = 0.7\) and \(\text{BF}(K^0_S \to \pi^0 \pi^0) = 0.3\). Which decay file would you write?
Hint
Use b2help-particles: List particles used in the event generator to identify the names to be used for \(\phi\), \(K^+\) and \(\pi^0\). Probably you want to check their PDG codes if you do not remember them…
Solution
1Decay Upsilon(4S)
21.0 B0 anti-B0 VSS_BMIX dm;
3Enddecay
4
5Decay B0
61.0 phi K_S0 PHSP;
7Enddecay
8CDecay anti-B0
9
10Decay phi
111.0 K+ K- VSS;
12Enddecay
13
14Decay K_S0
150.7 pi+ pi- PHSP;
160.3 pi0 pi0 PHSP;
17Enddecay
18
19End
Key points
Producing MC samples by ourselves should be used only for running quick tests or debugging.
Producing MC signal samples is not a trivial task, because the correct branching fractions and decay models must be used for all the final states under study. Contact the contact the Data Production liason of your physics Working Group for producing large MC signal samples in the correct way.
Stuck? We can help!
If you get stuck or have any questions to the online book material, the #starterkit-workshop channel in our chat is full of nice people who will provide fast help.
Refer to Collaborative Tools. for other places to get help if you have specific or detailed questions about your own analysis.
Improving things!
If you know how to do it, we recommend you to report bugs and other requests
with JIRA. Make sure to use the
documentation-training
component of the Belle II Software
project.
If you just want to give very quick feedback, use the last box “Quick feedback”.
Please make sure to be as precise as possible to make it easier for us to fix things! So for example:
typos (where?)
missing bits of information (what?)
bugs (what did you do? what goes wrong?)
too hard exercises (which one?)
etc.
If you are familiar with git and want to create your first pull request for the software, take a look at How to contribute. We’d be happy to have you on the team!
Quick feedback!
Author of this lesson
Giacomo “the goat” De Pietro