.. _onlinebook_generating_mc: Generating Monte Carlo ====================== .. sidebar:: Overview :class: overview **Teaching**: 30 min **Exercises**: 30 min **Prerequisites**: * The basics. * First steering file. **Questions**: * How can I generate my own MC? **Objectives**: * Generate 10 :math:`B^0 \bar{B}^0` events * Write a simple decay file and generate signal events Why do we need Monte Carlo simulated data? ------------------------------------------ Monte Carlo (MC) data are widely used in High Energy Physics. MC simulations allow us to mimic what happens in the Belle II detector after the collisions to understand the experimental conditions and performance. MC data are processed as real data in both reconstruction and physics analyses, with the advantage that we know the "truth" (e.g. if a track observed in the CDC was originated by a muon or a pion). The more realistic the MC simulation, the better we can interpret the results obtained by analyzing collisions events. Usually we need large MC samples (larger than the collisions dataset we want to use for the physics analysis) to avoid spurious fluctuations and to reduce uncertainties driven by the size of the MC sample. However, the MC samples that we need for our analysis don't have to be generated by ourselves: this is one of the duty of the Data Production group. Each physics Working Group has a dedicated person (the `Data Production liason`_) to assist the analysers and to collect their requests concerning the MC production. .. _Data production liason: https://confluence.desy.de/display/BI/Data+production+WebHome There are several reasons for discouraging the self-production of very large MC samples: * our own-generated MC sample may use an inconsistent version of the software (or conditions database, beam background, generator settings, etc.) with respect to the ones produced centrally and used by our colleagues; * we do not have the computing resources to produce large MC samples; the Data Production group uses the grid to properly handle a MC production campaign, while we can not; moreover, using a batch submission system for a private large production is not a good idea, because it takes a lot of time and, more important, we will subtract important computing resources to our colleaguers and to their jobs; * we do not have the space to store a large number of MC samples. .. admonition:: Question :class: exercise stacked How much disk space would you need to store mDST files equivalent to :math:`500\;\text{fb}^{-1}` of generic :math:`B^0 \bar{B}^0` events? You can assume an mDST size for a single :math:`B^0 \bar{B}^0` event of :math:`10\;\text{kB}`. .. admonition:: Hint :class: xhint toggle stacked The only other input we need is the cross section for for the production of :math:`B^0 \bar{B}^0` events. .. admonition:: Solution :class: toggle solution Knowing that the cross section for the production of :math:`B^0 \bar{B}^0` events at Belle II is :math:`0.540\;\text{nb}`, we would need about :math:`2.55\;\text{TB}` for storing such a sample. However, we may need to produce a small MC sample by our-self: for example, we want to test a new feature of the detector simulation, or we want to test the efficiency of a new trigger line, and so on. .. warning:: Don't generate larger samples yourself, contact the `Data Production liason`_ of your physics Working Group if you need a large production of MC signal samples! Main steps for the Monte Carlo production ----------------------------------------- The production of a MC sample can be divided in three main steps: * the generation of the MC particles using a MC generator; the currently used generators have not been developed within Belle II but have been inherited from the HEP community; * the simulation of the detector response and of the L1 trigger; the simulation of the interaction of the MC particles with the matter is handled by Geant4, while the digitization of the simulated hits is handled by subdetector modules; * the reconstruction of tracks, electromagnetic clusters, etc.; the same modules and algorithms used at this step are used also for reconstructing collisions events. .. warning:: We can produce MC samples only by using a full release, and not a light one. In a light release we do not have the generators, simulation and reconstruction packages, meaning that running with a light release a steering file intended for some MC production will result in a ``basf2`` crash. How to write a steering file for the Monte Carlo production ----------------------------------------------------------- Of course, for the production of a MC sample we have to write ``basf2`` a steering file. Let's see how to write it. As usual, one has to start by importing ``basf2`` and creating a main path. .. code-block:: python import basf2 as b2 main = b2.Path() Then, the first module to be added to the main path **must** be `EventInfoSetter`. This module is necessary for correctly setting some of the production parameters, like the number of events to be produced or the experiment number. .. admonition:: Question :class: exercise stacked You want to generate 10 events using the full Belle II geometry (experiment 0). Which configuration do you have to use for `EventInfoSetter`? .. admonition:: Hint :class: toggle xhint stacked Check the documentation (`EventInfoSetter`) to identify the module's parameters and their names. .. admonition:: Solution :class: toggle solution You have to add to your steering path the following line: .. code-block:: python main.add_module('EventInfoSetter', evtNumList=[10], expList=[0]) Alternatively, you can just add the module to the steering path without setting the module's parameters and you can set the number of events and the experiment number via command line using the ``basf2`` arguments: .. code-block:: bash basf2 my_steering_file.py -n 10 --experiment 0 .. tip:: You can also use `b2help-modules` to look at a module's parameters. For example, you can run on your shell ``b2help-modules EventInfoSetter``. Now we have to add to the steering path the relevant functions for: * generating MC events (for example, some :math:`B^0 \bar{B}^0` events): :py:func:`generators.add_evtgen_generator` * simulating the detector response, including the L1 trigger: :py:func:`simulation.add_simulation` * reconstructing the relevant objects, like tracks, electromagnetic clusters, etc.: :py:func:`reconstruction.add_reconstruction` * storing the reconstructed objects into an output mDST file for further studies: :py:func:`mdst.add_mdst_output` .. admonition:: Question :class: exercise stacked Can you figure out how to add all the previous functions to your steering file in order to produce a small sample of :math:`B^0 \bar{B}^0` MC events? .. admonition:: Hint :class: toggle xhint stacked Do not forget to import all the relevant python modules (for example, ``import simulation`` for ``add_simulation()``). .. admonition:: Solution :class: toggle solution .. literalinclude:: steering_files/065_generate_mc.py :language: python .. tip:: To monitor the status of the script execution, it is advisable to add to the steering path the `Progress` module. It periodically prints a message on the screen about the number of processed events. You can add it wherever you like between ``main.add_module('EventInfoSetter')`` and ``b2.process(path=main)``. .. warning:: Do not try to generate a large sample: it will take a lot of time! For the exercise, limit the number of generated events to 10. If you want to produce a larger sample (e.g. 1000 events), consider to use a batch submission system (whose usage will be explained in next lessons: :ref:`onlinebook_bsub` for the KEKCC system and :ref:`onlinebook_htcondor`). .. warning:: Even if the produced mDST file is a ``.root`` file, you can not open it like you did in previous lessons using ``pandas``, because it contains ``basf2`` objects and it is not designed to be properly opened with different softwares. In case you are really curious to know which objects are stored there, you can run ``b2file-size my_mdst_output.root`` in your shell. .. admonition:: Question :class: exercise stacked How can we perform a physics analysis on the output mDST file of the previous exercise? .. admonition:: Solution :class: toggle solution This is very simple: write a steering file similar to the one you wrote in the :ref:`onlinebook_first_steering_file` lesson. .. tip:: If you need more complex steering files for your quick studies or you want to use different MC generators, you can always check our `MC production repository`_ (used and maintained by the Data Production group). This repository is useful also to check which configurations were used for a specific MC production campaign. .. _MC production repository: https://stash.desy.de/projects/B2P/repos/mc/browse How to write a decay file and to generate a signal Monte Carlo sample --------------------------------------------------------------------- In the last exercise we learnt how to generate a generic :math:`B^0 \bar{B}^0` event. Probably you are wondering where the decay channels of the :math:`B^0` / :math:`\bar{B}^0` and the respective branching fractions are defined, or if it possible to specify one or few specific decay channels for the generated particles. All the known decay channels of the generated :math:`B^0` and :math:`\bar{B}^0` mesons are defined in our `main decay file`_. This file is automatically loaded by the function :py:func:`generators.add_evtgen_generator` and it is then passed to the `EvtGen generator`_. .. _main decay file: https://gitlab.desy.de/belle2/software/basf2/-/tree/main/decfiles/dec/DECAY_BELLE2.DEC .. _EvtGen generator: https://evtgen.hepforge.org/ The main decay file is very long and probably confusing for a newcomer. In order to correctly read and understand it, let us now move to a different topic: how can we generate events specifying the decay channels of our interest? In our internal jargon, this is called "generating signal MC events". Suppose that we want to study the :math:`B^0 \to J/\Psi K^0_S` decay, with :math:`J/\Psi \to \mu^+ \mu^-` and :math:`K^0_S \to \pi^+ \pi^-`. A decay file that we can use is the following one (copy it into a file named ``my_decay_file.dec`` or something similar): .. literalinclude:: steering_files/my_decay_file.dec Looking at this simple decay file, it is not difficult to understand its structure: * the decay of a specific particle ``X`` starts with ``Decay X`` and ends with ``Enddecay``; * the decay channels are then defined by the lines starting with numbers and each line defining a decay channel must end with a ``;``; * a decay channel is fully identified by: its branching fraction (``1.0`` in all the cases above), its final state (for the ``B0`` case it is ``J/psi K_S0``) and the `EvtGen decay model`_ that we want to use for that specific final state; in this case we used several models: ``VSS_BMIX dm`` is a model for vector particles decaying into two scalars in which the :math:`B^0` / :math:`\bar{B}^0` mesons mixing is turned on, ``VLL`` is a model for vector particles decaying into two leptons, while ``PHSP`` is a simple phase space decay model. .. _EvtGen decay model: https://evtgen.hepforge.org/doc/models.html * in case you want to use for a particle the same final states (assuming charge-conjugation) of its charge-conjucated partner, you have to specify it using the ``CDecay`` identifier, followed by the charge-conjucated state; * the decay file must end with ``End``; * each line starting with a ``#`` is interpreted as a comment by ``EvtGen`` and it is thus ignored. .. warning:: This particular decay file will produce two signal particles per event. Usually, for a signal sample we want to generate one "signal particle" (namely: one particle decaying following exactly your signal decay file) and one "generic particle" (one particle whose decay is instead driven by the `main decay file`_). This can be achieved by using "particle aliases" in the decay file. This is documented in the `EvtGen generator`_ website. .. warning:: It may be complicated figuring out which decay model you have to use for a specific decay channel. You can always refer to our `main decay file`_ or, even better, you can contact the `Data Production liason`_ of your physics working group. Probably you are also wondering where all the particle names are defined (while for ``B0`` and ``mu+`` the naming scheme is quite obvious, this is not the same for ``K_S0``). These names are defined in the `evt.pdl`_ file, but luckily you do not have to go through it, because we have a dedicated tool that parses the file and allows you to easily get all the relevant information: :ref:`b2help-particles`. .. _evt.pdl: https://gitlab.desy.de/belle2/software/basf2/-/tree/main/framework/particledb/data/evt.pdl .. admonition:: Question :class: exercise stacked Before moving on... how many particles having a mass between :math:`3.0\;\text{GeV/c}^2` and :math:`3.6\;\text{GeV/c}^2` are defined in the `evt.pdl`_ file? .. admonition:: Hint :class: toggle xhint stacked Can you use `b2help-particles` for this? .. admonition:: Solution :class: toggle solution Running ``b2help-particles --min 3.0 --max 3.6`` shows that there are 7 particles in this particular mass range: :math:`J/\Psi`, :math:`\Xi_{cc}^+`, :math:`\bar{\Xi}_{cc}^-` :math:`\chi_{c0}`, :math:`\chi_{c1}`, :math:`\chi_{c2}` and :math:`h_c`. .. admonition:: Question :class: exercise stacked Which line of the previous steering file has to be modified in order to generate our signal sample? And how does it have to be modified? Run the resulting steering file again. .. admonition:: Solution :class: toggle solution .. literalinclude:: steering_files/067_generate_mc.py :language: python .. admonition:: Question :class: exercise stacked Suppose that now you are interested to a different final state: :math:`B^0 \to \phi K^0_S`, with: :math:`\text{BF}(\phi \to K^+ K^-) = 1`, :math:`\text{BF}(K^0_S \to \pi^+ \pi^-) = 0.7` and :math:`\text{BF}(K^0_S \to \pi^0 \pi^0) = 0.3`. Which decay file would you write? .. admonition:: Hint :class: toggle xhint stacked Use :ref:`b2help-particles` to identify the names to be used for :math:`\phi`, :math:`K^+` and :math:`\pi^0`. Probably you want to check their `PDG codes`_ if you do not remember them... .. _PDG codes: https://pdg.lbl.gov/2019/reviews/rpp2019-rev-monte-carlo-numbering.pdf .. admonition:: Solution :class: toggle solution .. code-block:: Decay Upsilon(4S) 1.0 B0 anti-B0 VSS_BMIX dm; Enddecay Decay B0 1.0 phi K_S0 PHSP; Enddecay CDecay anti-B0 Decay phi 1.0 K+ K- VSS; Enddecay Decay K_S0 0.7 pi+ pi- PHSP; 0.3 pi0 pi0 PHSP; Enddecay End .. admonition:: Key points :class: key-points * Producing MC samples by ourselves should be used only for running quick tests or debugging. * Producing MC signal samples is not a trivial task, because the correct branching fractions and decay models must be used for all the final states under study. Contact the contact the `Data Production liason`_ of your physics Working Group for producing large MC signal samples in the correct way. .. include:: ../lesson_footer.rstinclude .. rubric:: Author of this lesson Giacomo "the goat" De Pietro