.. _onlinebook_generating_mc:

Generating Monte Carlo
======================

.. sidebar:: Overview
    :class: overview

    **Teaching**: 30 min

    **Exercises**: 30 min

    **Prerequisites**:

    * The basics.
    * First steering file.

    **Questions**:

    * How can I generate my own MC?

    **Objectives**:

    * Generate 10 :math:`B^0 \bar{B}^0` events
    * Write a simple decay file and generate signal events


Why do we need Monte Carlo simulated data?
------------------------------------------

Monte Carlo (MC) data are widely used in High Energy Physics. MC simulations allow us to mimic what happens in the
Belle II detector after the collisions to understand the experimental conditions and performance. MC data
are processed as real data in both reconstruction and physics analyses, with the advantage that we know the "truth"
(e.g. if a track observed in the CDC was originated by a muon or a pion). The more realistic the MC simulation,
the better we can interpret the results obtained by analyzing collisions events.

Usually we need large MC samples (larger than the collisions dataset we want to use for the physics analysis) to
avoid spurious fluctuations and to reduce uncertainties driven by the size of the MC sample.
However, the MC samples that we need for our analysis don't have to be generated by ourselves:
this is one of the duty of the Data Production group.
Each physics Working Group has a dedicated person (the `Data Production liason`_) to assist the analysers
and to collect their requests concerning the MC production.

.. _Data production liason: https://confluence.desy.de/display/BI/Data+production+WebHome

There are several reasons for discouraging the self-production of very large MC samples:

* our own-generated MC sample may use an inconsistent version of the software (or conditions database, beam
  background, generator settings, etc.) with respect to the ones produced centrally and used by our colleagues;

* we do not have the computing resources to produce large MC samples; the Data Production group uses the grid
  to properly handle a MC production campaign, while we can not; moreover, using a batch submission system for a
  private large production is not a good idea, because it takes a lot of time and, more important, we will subtract
  important computing resources to our colleaguers and to their jobs;

* we do not have the space to store a large number of MC samples.


.. admonition:: Question
     :class: exercise stacked

     How much disk space would you need to store mDST files equivalent to :math:`500\;\text{fb}^{-1}` of generic
     :math:`B^0 \bar{B}^0` events? You can assume an mDST size for a single :math:`B^0 \bar{B}^0` event
     of :math:`10\;\text{kB}`.

.. admonition:: Hint
     :class: xhint toggle stacked

     The only other input we need is the cross section for for the production of :math:`B^0 \bar{B}^0` events.

.. admonition:: Solution
     :class: toggle solution

     Knowing that the cross section for the production of :math:`B^0 \bar{B}^0` events at Belle II is
     :math:`0.540\;\text{nb}`, we would need about :math:`2.55\;\text{TB}` for storing such a sample.


However, we may need to produce a small MC sample by our-self: for example, we want to test a new feature of the
detector simulation, or we want to test the efficiency of a new trigger line, and so on.


.. warning::

   Don't generate larger samples yourself, contact the `Data Production liason`_ of your physics Working Group
   if you need a large production of MC signal samples!


Main steps for the Monte Carlo production
-----------------------------------------

The production of a MC sample can be divided in three main steps:

* the generation of the MC particles using a MC generator; the currently used generators have not been developed
  within Belle II but have been inherited from the HEP community;

* the simulation of the detector response and of the L1 trigger; the simulation of the interaction of the MC particles
  with the matter is handled by Geant4, while the digitization of the simulated hits is handled by subdetector
  modules;

* the reconstruction of tracks, electromagnetic clusters, etc.; the same modules and algorithms used at this step
  are used also for reconstructing collisions events.


.. warning::

   We can produce MC samples only by using a full release, and not a light one. In a light release we do not have
   the generators, simulation and reconstruction packages, meaning that running with a light release a steering file
   intended for some MC production will result in a ``basf2`` crash.


How to write a steering file for the Monte Carlo production
-----------------------------------------------------------

Of course, for the production of a MC sample we have to write ``basf2`` a steering file. Let's see how to write it.

As usual, one has to start by importing ``basf2`` and creating a main path.


.. code-block:: python

   import basf2 as b2
   main = b2.Path()


Then, the first module to be added to the main path **must** be `EventInfoSetter`. This module is necessary for
correctly setting some of the production parameters, like the number of events to be produced or the experiment
number.


.. admonition:: Question
     :class: exercise stacked

     You want to generate 10 events using the full Belle II geometry (experiment 0). Which configuration do you
     have to use for `EventInfoSetter`?

.. admonition:: Hint
     :class: toggle xhint stacked

     Check the documentation (`EventInfoSetter`) to identify the module's parameters and their names.

.. admonition:: Solution
     :class: toggle solution

     You have to add to your steering path the following line:

     .. code-block:: python

        main.add_module('EventInfoSetter', evtNumList=[10], expList=[0])

     Alternatively, you can just add the module to the steering path without setting the module's parameters and
     you can set the number of events and the experiment number via command line using the ``basf2`` arguments:

     .. code-block:: bash

        basf2 my_steering_file.py -n 10 --experiment 0


.. tip::

   You can also use `b2help-modules` to look at a module's parameters. For example, you can run on your shell
   ``b2help-modules EventInfoSetter``.


Now we have to add to the steering path the relevant functions for:

* generating MC events (for example, some :math:`B^0 \bar{B}^0` events): :py:func:`generators.add_evtgen_generator`

* simulating the detector response, including the L1 trigger: :py:func:`simulation.add_simulation`

* reconstructing the relevant objects, like tracks, electromagnetic clusters, etc.: :py:func:`reconstruction.add_reconstruction`

* storing the reconstructed objects into an output mDST file for further studies: :py:func:`mdst.add_mdst_output`


.. admonition:: Question
     :class: exercise stacked

     Can you figure out how to add all the previous functions to your steering file in order to produce a small sample
     of :math:`B^0 \bar{B}^0` MC events?

.. admonition:: Hint
     :class: toggle xhint stacked

     Do not forget to import all the relevant python modules (for example, ``import simulation`` for ``add_simulation()``).

.. admonition:: Solution
     :class: toggle solution

     .. literalinclude:: steering_files/065_generate_mc.py
        :language: python


.. tip::

   To monitor the status of the script execution, it is advisable to add to the steering path the
   `Progress` module. It periodically prints a message on the screen about the number of processed events.
   You can add it wherever you like between ``main.add_module('EventInfoSetter')`` and ``b2.process(path=main)``.


.. warning::

   Do not try to generate a large sample: it will take a lot of time! For the exercise, limit the number
   of generated events to 10. If you want to produce a larger sample (e.g. 1000 events), consider to use
   a batch submission system (whose usage will be explained in next lessons: :ref:`onlinebook_bsub`
   for the KEKCC system and :ref:`onlinebook_htcondor`).


.. warning::

   Even if the produced mDST file is a ``.root`` file, you can not open it like you did in previous lessons using
   ``pandas``, because it contains ``basf2`` objects and it is not designed to be properly opened with different
   softwares. In case you are really curious to know which objects are stored there, you can run
   ``b2file-size my_mdst_output.root`` in your shell.


.. admonition:: Question
     :class: exercise stacked

     How can we perform a physics analysis on the output mDST file of the previous exercise?

.. admonition:: Solution
     :class: toggle solution

     This is very simple: write a steering file similar to the one you wrote in the
     :ref:`onlinebook_first_steering_file` lesson.


.. tip::

   If you need more complex steering files for your quick studies or you want to use different MC generators, you
   can always check our `MC production repository`_ (used and maintained by the Data Production group). This
   repository is useful also to check which configurations were used for a specific MC production campaign.

.. _MC production repository: https://stash.desy.de/projects/B2P/repos/mc/browse


How to write a decay file and to generate a signal Monte Carlo sample
---------------------------------------------------------------------

In the last exercise we learnt how to generate a generic :math:`B^0 \bar{B}^0` event. Probably you are wondering
where the decay channels of the :math:`B^0` / :math:`\bar{B}^0` and the respective branching fractions are defined,
or if it possible to specify one or few specific decay channels for the generated particles.

All the known decay channels of the generated :math:`B^0` and :math:`\bar{B}^0` mesons are defined in our
`main decay file`_. This file is automatically loaded by the function :py:func:`generators.add_evtgen_generator` and
it is then passed to the `EvtGen generator`_.

.. _main decay file: https://gitlab.desy.de/belle2/software/basf2/-/tree/main/decfiles/dec/DECAY_BELLE2.DEC

.. _EvtGen generator: https://evtgen.hepforge.org/

The main decay file is very long and probably confusing for a newcomer. In order to correctly read and understand it,
let us now move to a different topic: how can we generate events specifying the decay channels of our interest?
In our internal jargon, this is called "generating signal MC events".

Suppose that we want to study the :math:`B^0 \to J/\Psi K^0_S` decay, with :math:`J/\Psi \to \mu^+ \mu^-` and
:math:`K^0_S \to \pi^+ \pi^-`. A decay file that we can use is the following one (copy it into a file named
``my_decay_file.dec`` or something similar):

.. literalinclude:: steering_files/my_decay_file.dec

Looking at this simple decay file, it is not difficult to understand its structure:

* the decay of a specific particle ``X`` starts with ``Decay X`` and ends with ``Enddecay``;

* the decay channels are then defined by the lines starting with numbers and each line
  defining a decay channel must end with a ``;``;

* a decay channel is fully identified by: its branching fraction (``1.0`` in all the cases above), its final state
  (for the ``B0`` case it is ``J/psi K_S0``) and the `EvtGen decay model`_ that we want to use for that specific
  final state; in this case we used several models: ``VSS_BMIX dm`` is a model for vector particles decaying into two
  scalars in which the :math:`B^0` / :math:`\bar{B}^0` mesons mixing is turned on, ``VLL`` is a model for vector
  particles decaying into two leptons, while ``PHSP`` is a simple phase space decay model.

.. _EvtGen decay model: https://evtgen.hepforge.org/doc/models.html

* in case you want to use for a particle the same final states (assuming charge-conjugation) of its charge-conjucated
  partner, you have to specify it using the ``CDecay`` identifier, followed by the charge-conjucated state;

* the decay file must end with ``End``;

* each line starting with a ``#`` is interpreted as a comment by ``EvtGen`` and it is thus ignored.


.. warning::

   This particular decay file will produce two signal particles per event. Usually, for a signal sample we want to
   generate one "signal particle" (namely: one particle decaying following exactly your signal decay file) and one
   "generic particle" (one particle whose decay is instead driven by the `main decay file`_). This can be achieved
   by using "particle aliases" in the decay file. This is documented in the `EvtGen generator`_ website.


.. warning::

   It may be complicated figuring out which decay model you have to use for a specific decay channel. You can
   always refer to our `main decay file`_ or, even better, you can contact the `Data Production liason`_ of your
   physics working group.


Probably you are also wondering where all the particle names are defined (while for ``B0`` and ``mu+`` the naming
scheme is quite obvious, this is not the same for ``K_S0``). These names are defined in the `evt.pdl`_ file, but
luckily you do not have to go through it, because we have a dedicated tool that parses the file and allows you to
easily get all the relevant information: :ref:`b2help-particles`.

.. _evt.pdl: https://gitlab.desy.de/belle2/software/basf2/-/tree/main/framework/particledb/data/evt.pdl

.. admonition:: Question
     :class: exercise stacked

     Before moving on... how many particles having a mass between :math:`3.0\;\text{GeV/c}^2` and
     :math:`3.6\;\text{GeV/c}^2` are defined in the `evt.pdl`_ file?

.. admonition:: Hint
     :class: toggle xhint stacked

     Can you use `b2help-particles` for this?

.. admonition:: Solution
     :class: toggle solution

     Running ``b2help-particles --min 3.0 --max 3.6`` shows that there are 7 particles in this particular mass range:
     :math:`J/\Psi`, :math:`\Xi_{cc}^+`, :math:`\bar{\Xi}_{cc}^-` :math:`\chi_{c0}`, :math:`\chi_{c1}`,
     :math:`\chi_{c2}` and :math:`h_c`.


.. admonition:: Question
     :class: exercise stacked

     Which line of the previous steering file has to be modified in order to generate our signal sample? And how
     does it have to be modified? Run the resulting steering file again.

.. admonition:: Solution
     :class: toggle solution

     .. literalinclude:: steering_files/067_generate_mc.py
        :language: python


.. admonition:: Question
     :class: exercise stacked

     Suppose that now you are interested to a different final state: :math:`B^0 \to \phi K^0_S`, with:
     :math:`\text{BF}(\phi \to K^+ K^-) = 1`, :math:`\text{BF}(K^0_S \to \pi^+ \pi^-) = 0.7` and
     :math:`\text{BF}(K^0_S \to \pi^0 \pi^0) = 0.3`. Which decay file would you write?

.. admonition:: Hint
     :class: toggle xhint stacked

     Use :ref:`b2help-particles` to identify the names to be used for :math:`\phi`, :math:`K^+` and :math:`\pi^0`.
     Probably you want to check their `PDG codes`_ if you do not remember them...

.. _PDG codes: https://pdg.lbl.gov/2019/reviews/rpp2019-rev-monte-carlo-numbering.pdf

.. admonition:: Solution
     :class: toggle solution

     .. code-block::

        Decay Upsilon(4S)
        1.0 B0 anti-B0  VSS_BMIX dm;
        Enddecay

        Decay B0
        1.0 phi K_S0 PHSP;
        Enddecay
        CDecay anti-B0

        Decay phi
        1.0 K+ K- VSS;
        Enddecay

        Decay K_S0
        0.7 pi+ pi- PHSP;
        0.3 pi0 pi0 PHSP;
        Enddecay

        End


.. admonition:: Key points
    :class: key-points

    * Producing MC samples by ourselves should be used only for running quick tests or debugging.

    * Producing MC signal samples is not a trivial task, because the correct branching fractions and decay models
      must be used for all the final states under study. Contact the contact the `Data Production liason`_ of your
      physics Working Group for producing large MC signal samples in the correct way.


.. include:: ../lesson_footer.rstinclude

.. rubric:: Author of this lesson

Giacomo "the goat" De Pietro