.. _onlinebook_topoana:

Topology analysis
=================


.. sidebar:: Overview
    :class: overview

    **Length**: 1-1.5 hrs

    **Prerequisites**:

    * Command Line Tutorial (Bash)

    **Questions**:

    * What's inside my MC?
    * How frequently do I have certain different decay modes in my MC?

    **Objectives**:

    * Install ``TopoAna``.
    * Generate a PDF report with a lot of information about your MC.


.. tip::

   This online textbook introduces ``TopoAna`` very briefly.
   For more descriptions of the tool, please refer to the documents introduced
   in :numref:`TopologyAnalysis`.
   Please feel free to contact Xingyu Zhou (zhouxy@buaa.edu.cn) if you have any
   questions or comments on ``TopoAna`` and this online textbook.

Introduction
------------

In the data analysis of high energy physics experiments, a comprehensive
understanding of the inclusive/generic MC samples is required to select signals
with a higher efficiency and meanwhile suppress backgrounds to a lower level.
In particular, a clear knowledge of the physics processes, namely the event types,
involved in the samples is quite helpful.

With the physics process information, we can figure out the main backgrounds
(especially the peaking ones).
Then, we can optimize the selection criteria further by analyzing the differences
between the main backgrounds and the signals.
Even if it is difficult to further suppress these backgrounds, the knowledge of
their types is beneficial to estimate the systematic uncertainties associated
with them.

Sometimes, we need to search for certain processes of interests.
Mostly, signal and background events coexist in inclusive MC samples. It is
useful to differentiate them in such cases.
The identified signal events can be used to make up a signal sample in the absence
of specialized signal samples, or they can be removed to avoid repetition in the
presence of specialized signal samples.
Occasionally, we have to pick out some decay branches in order to re-weight them
according to new theoretical predictions or updated experimental measurements.

.. _topology_diagrams:

.. figure:: topoana/topology_diagrams.png
   :width: 40em
   :align: center

   Topology diagrams of (a) :math:`e^+ e^- \to J/\psi`, :math:`J/\psi \to \rho^+
   \pi^-`, :math:`\rho^+ \to \pi^+ \pi^0`, :math:`\pi^0 \to \gamma \gamma` and
   (b) :math:`e^+ e^- \to \Upsilon(4S)`, :math:`\Upsilon(4S) \to B^0 \bar{B}^0`,
   :math:`B^0 \to K_S^0 J/\psi`, :math:`\bar{B}^0 \to \mu^- D^{*+} \nu_{\mu}`,
   :math:`K_S^0 \to \pi^+ \pi^-`, :math:`J/\psi \to e^+ e^-`, :math:`D^{*+} \to
   D^0 \pi^+`, :math:`D^0 \to \pi^0 \pi^+ K^-`, :math:`\pi^0 \to \gamma \gamma`.

Processes in high energy physics can be visualized with topology diagrams.
As an example, :numref:`topology_diagrams` shows the topology diagrams of two
typical physics processes occurring at :math:`e^+e^-` colliders.
From the figure, the hierarchies of the processes and the relationships among
the particles are clearly illustrated with the diagrams.
Though the complexities of topology diagrams vary with physics processes, there
is only one diagram corresponding to each process.
For this reason, we refer to the physics process information/analysis mentioned
thereinbefore as topology information/analysis hereinafter.

Since the raw topology truth information of inclusive MC samples is counter-intuitive,
diverse, and overwhelming, it is difficult for analysts to check the topology
information of the samples directly.
To help them do the checks quickly and easily, a topology analysis program called
``TopoAna`` is developed with ``C++``, ``ROOT``, and ``LaTeX``.

Basics of the program
---------------------

The input of the program is one or more root files including a ``TTree`` object
which contains the raw topology truth information of the inclusive MC samples
under study.
To be specific, the information in each entry of the ``TTree`` object consists
of the following three ingredients associated with the particles produced in an
event of the samples: **the number of the particles**, **the PDG codes of the
particles**, and **the mother indices of the particles**.

.. note::

   1. The particles do not include the initial state particles (:math:`e^+` and
      :math:`e^-` in the Belle II experiment), which are default and thus omitted.

   2. The indices of particles are integers starting from zero (included) to the
      number of particles (excluded); they are obvious and hence not taken as an
      input ingredient for topology analysis.

Fundamentally, the program resolves counter-intuitive, diverse, and overwhelming
input data

.. _input_data:

.. figure:: topoana/input_data.png
   :width: 32em
   :align: center

into highly readable symbolic expressions of physics processes

.. _physics_process:

.. figure:: topoana/physics_process.png
   :width: 32em
   :align: center

.. tip::

   * Here, the decay branches in the process are placed into two blocks in order
     to make full use of the page space.

   * In both blocks, the first, second, and third columns are the indices, symbolic
     expressions, and mother indices of the decay branches.

   * Notably, all the decay branches of :math:`\pi^0 \to \gamma \gamma` are omitted
     in order to make the process look more concise.

Generally, the functionalities of the program are as follows.

1. The program recognizes, categorizes, and counts physics processes in each
   event of the samples.

2. It tags the physics processes in the corresponding entry of the output root
   files.

   .. tip::

     Except for the tags, the input ``TTree`` object in the output root
     files is entirely the same as that in the input root files.

3. After processing the events, the program exports the obtained topology
   information at the sample level (topology maps) to the output plain text,
   tex source, and pdf files.

.. tip::

   * Although the files are in different formats, they have the same
     information.

   * The pdf file is the easiest to read. It is converted from the tex source
     file by the ``pdflatex`` command.

   * The plain text file is convenient to be checked with text processing
     commands as well as text editors.

.. note::

   Below are some general remarks on the relationship between the ``TopoAna``
   program and the :ref:`mcmatching` module in ``basf2``, namely
   :b2:mod:`MCMatcherParticles`.

   1. As we introduce in :numref:`TopologyAnalysis`, ``TopoAna`` is an offline
      tool independent of ``basf2``. It gets the human-friendly topology
      information on its own from the raw MC truth information obtained with
      the interface ``MCGenTopo``. Obviously, it does not rely on
      :b2:mod:`MCMatcherParticles`.

   2. On the other hand, in order to analyze some backgrounds more effectively,
      a series of newly developed parameters in a few functionality items in
      ``TopoAna`` can make use of the MC matching results obtained with
      :b2:mod:`MCMatcherParticles`. For more details, please see Section 5.2.3
      (titled **Reconstruction restrictions on truth particles**) in the user
      guide we introduce in :numref:`TopologyAnalysis`.

   3. In addition, there are some functionality overlaps between ``TopoAna``
      and :b2:mod:`MCMatcherParticles`. If highly needed, in the future we could
      also develop some new functionalities in ``TopoAna`` for the cross-checks
      of the MC matching results obtained with :b2:mod:`MCMatcherParticles`.

Install the software
--------------------

Please follow the steps below to install the software.

1. Set up ``basf2`` (see :numref:`cvmfs_setup`) if you have not

   .. note::

    This guarantees you have ``ROOT`` installed in your environment.
    It does not mean ``TopoAna`` is a software based on ``basf2``.
    Instead, the software is only based on ``ROOT``.

2. ``cd yourDirectoryForTopoAna``

3. Clone the ``TopoAna`` repository from GitLab with
   ``git clone git@gitlab.desy.de:belle2/software/topoana.git``

4. ``cd topoana``

5. Switch to the latest release with ``git checkout vxx-yy-zz``

   .. note::

    Please replace it with the concrete version number, which you can find
    with ``git tag | tail -1``.

6. Configure the package path with ``./Configure``

   .. note::

    You need to manually set up the environment variable ``PATH`` according
    to the guidelines printed out by the command.

7. Compile and link the program with ``make``

   .. tip::

    The installation succeeds if you see the following line: ``"topoana.exe"
    installed successfully!``

8. Set up the experiment name with ``./Setup Belle_II``

   .. note::

    If you want to try the program with examples under the directory
    ``examples``, please execute ``./Setup Example``.

Get the input data
------------------

As we mention in :numref:`TopologyAnalysis`, ``MCGenTopo`` is the interface of
``basf2`` to ``TopoAna``.
In the following we introduce the steps to get the input data to ``TopoAna``
with the interface.

1. Append the following statement at the beginning part of your python
   steering script

   .. code-block:: python

    from variables.MCGenTopo import mc_gen_topo

2. Use the parameter function ``mc_gen_topo(n)`` as a list of variables in the
   steering function ``variablesToNtuple`` as follow

   .. code-block:: python

    variablesToNtuple(particleList, yourOwnVariableList + mc_gen_topo(n), treeName, fieName, path)

   Here, ``n`` is the number of ``MCGenPDG_i``/``MCGenMothIndex_i`` variables,
   and its default value is 200.

3. Run your python steering script with ``basf2``

Below is an example of the python steering script.

.. literalinclude:: topoana/MCGenTopo.py
   :language: python

.. note::

   In practice, we usually use the interface ``MCGenTopo`` together with detailed
   particle lists and variable lists for specific physics analyses.
   For the sake of simplification, we do not include the latter in this script.

After the following steps:

1. ``mkdir test`` under the ``TopoAna`` package.

2. ``cd test``

3. Create a new python steering script named ``MCGenTopo.py``, and copy and
   paste the content of the script above into it.

4. ``basf2 MCGenTopo.py``

you get a root file ``MCGenTopo.root`` containing a ``TTree`` object ``MCGenTopo``,
which in return contains the MC truth information for topology analysis.
With the ``C/C++`` interpreter of ``ROOT``, you can check and see the MC truth
information as follows.
Notably, the comments on the right side are the explanations on the command lines
and the key variables.

.. code-block:: none

   [zhouxy@ccw04 test]$ root -l                         // Get into the interpreter
   root [0] TFile f("MCGenTopo.root")                   // Open the root file
   (TFile &) Name: MCGenTopo.root Title:
   root [1] f.ls()                                      // List the objects in the root file
   TFile**         MCGenTopo.root
    TFile*         MCGenTopo.root
     KEY: TTree    MCGenTopo;1
   root [2] MCGenTopo->Show(0)                          // Show the first entry of the TTree object
   ======> EVENT:0
    __experiment__  = 0
    __run__         = 0
    __event__       = 1
    __production__  = 0
    __weight__      = 1
    nMCGen          = 59                                // nMCGen is the number of the particles
    MCGenPDG_0      = 300553
    MCGenMothIndex_0 = nan
    MCGenPDG_1      = 511
    MCGenMothIndex_1 = 0
    MCGenPDG_2      = -511                              // MCGenPDG_i is the PDG code
    MCGenMothIndex_2 = 0                                              of the i^th particle

    ...

    MCGenPDG_197    = nan                               // MCGenMothIndex_i is the mother index
    MCGenMothIndex_197 = nan                                                of the i^th particle
    MCGenPDG_198    = nan
    MCGenMothIndex_198 = nan
    MCGenPDG_199    = nan
    MCGenMothIndex_199 = nan
   root [3] .q                                          // Quit the interpreter

.. tip::

   Normally, the input data contain all the topology information of the samples.
   With the data, all kinds of topology analysis with ``TopoAna`` can be performed.

Prepare the card file
---------------------

To carry out topology analysis desired in your work, you have to provide some
necessary input, functionality, and output information to the program.
The information is required to be filled in the setting items designed and
implemented in the program, and the items have to be put in a plain text file
named with a suffix ``.card``.

.. note::

   * A template card file ``template_topoana.card`` can be found in the ``share``
     directory of the ``TopoAna`` package.

   * For the concision of your own card file, it is recommended to **just copy
     the setting items you need from the template card file and paste them to
     your own card file**.

   * Since there are plenty of setting items in the template card file, it is
     **NOT** recommended to create your own card file **simply by copying and
     revising the whole template card file**.

Below is an example of the card file.

.. code-block:: none

   % Names of input root files
   {
     MCGenTopo.root
   }

   % TTree name
   {
     MCGenTopo
   }

   % Component analysis --- decay trees
   {
     Y     100
   }

   % Common name of output files (Default: Name of the card file)
   {
     topoana
   }

   % Storage type of input raw topology truth information (Six options: AOI, VOI, MSI, MSF, MSD, and MSID. Default: AOI)
   {
      MSID
   }

In the card file, ``#``, ``%``, and the pair of ``{`` and ``}``, are used for
commenting, prompting, and grouping, respectively.
The first two items defines the input, the third one specifies the functionality,
and the last one sets the name of the program's output.

Below are some detailed explanations on these setting items.

* The first item sets the names of the input root files.

  .. tip::

   1. The names ought to be input one per line without tailing characters,
      such as comma, semicolon, and period.

   2. In the names, both the absolute and relative paths are allowed and
      wildcards ``[]``, ``?``, and ``*`` are supported, just like those in
      the root file names input to the method ``Add()`` of the class ``TChain``.

* The second item specifies the ``TTree`` name.

  .. note::

   Here, the ``TTree`` object should contain the following variables:
   ``nMCGen``, ``MCGenPDG_i``, and ``MCGenMothIndex_i`` (``i = 0, 1, 2 ...``).

* The third item sets the basic functionality of the program, namely the
  component analysis over decay trees.
  With the second parameter ``100`` in the item, the maximum number of output
  components is set to 100.

  .. note::

   1. The item can be replaced or co-exist with other functionality items.

   2. At least one functionality item has to be specified explicitly in the
      card file, otherwise the program will terminate soon after its start
      because no topology analysis task to be performed is set up.

* The fourth item specifies the common name of the output files.
  The files will be described in the next part of this section.
  Though in different formats, they are denominated with the same name for the
  sake of uniformity.

  .. tip::

   * This item is optional, with the name of the card file as its default
     input value.

   * It is a good practice to first denominate the card file with the desired
     common name of the output files and then remove this item or leave it
     empty.

Run the program
---------------

With the card file, one can execute the program with the command line
``topoana.exe cardFileName``, where the argument ``cardFileName`` is optional and
its default value is ``topoana.card``.

.. tip:: You can try to execute ``topoana.exe --help`` to see other optional
         arguments supported in the command line.

After the following steps:

1. Create a new card file named ``topoana.card`` under the ``test`` directory
   we made above, and copy and paste the content of the card file above into it.

2. ``topoana.exe topoana.card``

   .. tip::

    * Since the name of the card file is the default one, you can just execute
      ``topoana.exe``.

    * If you encounter the following error,

      .. code-block:: none

         topoana.exe: error while loading shared libraries: libCore.so: cannot open shared object
         file: No such file or directory

      setting up ``basf2`` before executing the command solves the problem, 
      in cases that you installed ``TopoAna`` in the ``basf2`` environment
      previously.

you can get the following four output files: ``topoana.txt``, ``topoana.tex``,
``topoana.pdf``, and ``topoana.root``.
As we mention above, the program outputs the topology maps to the first three
files.
Although in different formats, the three files have the same information.
You can check and see them.

.. tip::

   If you are working on a remote server, you have two options to take a look at
   the pdf file:

   1. Copy the pdf file to your computer with ``scp`` (see :ref:`onlinebook_ssh`),

   2. Start a jupyter server with ``jupyter notebook`` and open the pdf file
      in the browser with the web interface (see :ref:`onlinebook_python`).

Below is only the screenshot of the first part of the table in the pdf file.

.. _decay_trees:

.. figure:: topoana/decay_trees.png
   :width: 40em
   :align: center

The column headers expressed with abbreviations are explained as follows:

.. tip::

   * ``rowNo`` --- row number
   * ``iDcyTr`` --- index of decay tree
   * ``nEtr`` --- number of entries
   * ``nCEtr`` --- number of cumulative entries

.. note::

   1. The values of ``iDcyTr`` are assigned from small to large in the program
      but listed according to the values of ``nEtr`` from large to small in the
      table. This is the reason why they are not in natural order like the
      values of ``rowNo``.

   2. Considering :math:`\pi^0` has a very large production rate and approximately
      99\% of it decays to :math:`\gamma \gamma`, the program is designed to discard
      the decay :math:`\pi^0 \to \gamma \gamma` by default at the early phase of
      processing the input data. As a result, :math:`\pi^0 \to \gamma \gamma`
      does not show itself in the table.

In the table, ``iDcyTr`` is the topology tag for decay trees.
Thus, it is also saved in the ``TTree`` object of the output root file.
With the ``C/C++`` interpreter of ``ROOT``, you can check it as follow and see
it at the end of the code block.
Similarly, the right side presents the explanations on the command lines and
the key variables.

.. code-block:: none

   [zhouxy@ccw04 test]$ root -l                         // Get into the interpreter
   root [0] TFile f("MCGenTopo.root")                   // Open the root file
   (TFile &) Name: MCGenTopo.root Title:
   root [1] f.ls()                                      // List the objects in the root file
   TFile**         MCGenTopo.root
    TFile*         MCGenTopo.root
     KEY: TTree    MCGenTopo;1
   root [2] MCGenTopo->Show(0)                          // Show the first entry of the TTree object
   ======> EVENT:0
    __experiment__  = 0
    __run__         = 0
    __event__       = 1
    __production__  = 0
    __weight__      = 1
    nMCGen          = 59                                // nMCGen is the number of the particles
    MCGenPDG_0      = 300553
    MCGenMothIndex_0 = nan
    MCGenPDG_1      = 511
    MCGenMothIndex_1 = 0
    MCGenPDG_2      = -511                              // MCGenPDG_i is the PDG code
    MCGenMothIndex_2 = 0                                              of the i^th particle

    ...

    MCGenPDG_197    = nan                               // MCGenMothIndex_i is the mother index
    MCGenMothIndex_197 = nan                                                of the i^th particle
    MCGenPDG_198    = nan
    MCGenMothIndex_198 = nan
    MCGenPDG_199    = nan
    MCGenMothIndex_199 = nan
    iDcyTr          = 0                                 // This is the newly added topology tag!
   root [3] .q                                          // Quit the interpreter

.. tip::

   The topology tag ``iDcyTr`` can be used to pick out the entries of specific
   decay trees and then examine the distributions of the other quantities over
   the decay trees.
   This is an important application of topology analysis.

Exercises
---------

The example above only introduces the basic usage of ``TopoAna``.
You can refer to the documents we introduce in :numref:`TopologyAnalysis` for
more descriptions of the tool.
At the end of this online textbook, we provide the following six exercises for
you to further explore the usage of ``TopoAna``.
You will benefit a lot when you accomplish these exercises.
To do these exercises (except for the first one), you need to look up the proper
setting items in the quick-start tutorial or the user guide we introduce in
:numref:`TopologyAnalysis`, add them to the card file, and re-run the program.

.. admonition:: Exercise 1
   :class: exercise stacked

   In the example above, we execute ``basf2 MCGenTopo.py`` and ``topoana.exe
   topoana.card`` in two steps. This is a regular practice and does not take
   too much efforts. However, sometimes you may want to execute the two command
   lines in one step. Could you figure out a way to do this?

.. admonition:: Hint
   :class: dropdown xhint stacked

   Use the ``system`` function of the ``os`` module in your python steering script.

.. admonition:: Solution
   :class: dropdown solution

   Revise your python steering script by adding ``import os`` to the end of its
   preamble and appending ``os.system('topoana.exe topoana.card')`` at its end,
   and then run the script with ``basf2``.

   Below is an example of the revised script.

   .. literalinclude:: topoana/topoana.py
      :language: python

.. admonition:: Extension
   :class: dropdown xhint

   Could you think of another way to do this? For example, with a shell script?

.. admonition:: Exercise 2
   :class: exercise stacked

   Try to examine the top 30 decay branches of :math:`B^{0}` and all the decay
   branches of :math:`D^{*+}` in the input sample.

.. admonition:: Hint
   :class: dropdown xhint stacked

   * See Section 3 in the quick-start tutorial for the introduction of the
     setting item.
   * See Section 3.3 in the user guide for the description of the setting item.

.. admonition:: Solution
   :class: dropdown solution

   Add the following setting item to the card file, re-run the program, and
   check the changes of the output files.

   .. code-block:: none

      % Component analysis --- decay branches of particles
      {
        B0         B0        30
        D*+        Dsp
      }

.. admonition:: Extension
   :class: dropdown xhint

   * See Section 3 in the user guide for the description of other similar
     setting items.

.. admonition:: Exercise 3
   :class: exercise stacked

   Try to identify the decay branches :math:`B^{0} \rightarrow \pi^{0} \bar{D}^{0}`
   and :math:`\bar{B}^{0} \rightarrow \mu^{-} \bar{\nu}_{\mu} D^{*+}` in the input
   sample.

.. admonition:: Hint
   :class: dropdown xhint stacked

   * See Section 4 in the quick-start tutorial for the introduction of the setting
     item.
   * See Section 4.4 in the user guide for the description of the setting item.

.. admonition:: Solution
   :class: dropdown solution

   Add the following setting item to the card file, re-run the program, and check
   the changes of the output files.

   .. code-block:: none

      % Signal identification --- decay branches
      {
        B0 --> pi0 anti-D0                    &       B2pi0D0bar
        anti-B0 --> D*+ mu- anti-nu_mu        &       B2munuDsp
      }

.. admonition:: Extension
   :class: dropdown xhint

   * See Section 4 in the user guide for the description of other similar setting
     items.

.. admonition:: Exercise 4
   :class: exercise stacked

   In the previous example and the above two exercises, we did not consider charge
   conjugation, which is an important concept in high energy physics.
   In this exercise, try to process the charge conjugate particles, decay branches,
   and decay trees together.

.. admonition:: Hint
   :class: dropdown xhint stacked

   * See Section 5 in the quick-start tutorial for the introduction of the setting
     item.
   * See Section 5.2.2 in the user guide for the description of the setting item.

.. admonition:: Solution
   :class: dropdown solution

   Add the following setting item to the card file, re-run the program, and check
   the changes of the output files.

   .. code-block:: none

      % Process charge conjugate objects together (Two options: Y and N. Default: N)
      {
        Y
      }

.. admonition:: Extension
   :class: dropdown xhint

   * See Section 5.2 in the user guide for the description of other similar
     setting items.

.. admonition:: Exercise 5
   :class: exercise stacked

   Try to impose some cuts to select events. For example, impose one cut
   to select only the events with ``nMCGen>=20 && nMCGen<=40``.

.. admonition:: Hint
   :class: dropdown xhint stacked

   * See Section 5.1.1 in the user guide for the description of the setting
     item.

.. admonition:: Solution
   :class: dropdown solution

   Add the following setting item to the card file, re-run the program, and
   check the changes of the output files.

   .. code-block:: none

      % Cut to select entries
      {
        nMCGen>=20 && nMCGen<=40
      }

.. admonition:: Extension
   :class: dropdown xhint

   * See Section 5.1 in the user guide for the description of other similar
     setting items.

.. admonition:: Exercise 6
   :class: exercise stacked

   Try to remove the input ``TBranch`` objects ``nMCGen``, ``MCGenPDG_i``, and
   ``MCGenMothIndex_i`` (``i = 0, 1, 2 ...``) from the output root files before
   the program terminates.

.. admonition:: Hint
   :class: dropdown xhint stacked

   * See Section 5.3.2 in the user guide for the description of the setting item.

.. admonition:: Solution
   :class: dropdown solution

   Add the following setting item to the card file, re-run the program, and check
   the changes of the output files.

   .. code-block:: none

      % Remove input tbranches from output root files (Two options: Y and N. Default: N)
      {
        Y
      }

.. admonition:: Extension
   :class: dropdown xhint

   * See Section 5.3 in the user guide for the description of other similar
     setting items.

.. include:: ../lesson_footer.rstinclude

.. rubric:: Author(s) of this lesson

Xing-Yu Zhou
