7.8.3. Flavor Tagger#

Authors: F. Abudinen, M. Gelb, L. Li Gioi

The Flavor Tagger is a module based on multivariate methods. It is designed to determine the flavor of the not reconstructed \(B^0\) meson in events with a neutral \(B\) meson pair. It can be used also in addition to continuum suppression in events with a charged \(B\) meson pair.

Tip

For an introductory hands-on lesson, take a look at Section 3.4.5.

See also

For a more detailed introduction, take a look at

Flavor Tagging Principle#

Considering an entangled neutral B-meson pair, if one of both decays to a CP eigenstate and the other to a flavor specific channel, the goal is to determine the flavor of the latter at the time of its decay. The different signatures of flavor specific decay channels can be grouped into categories. Up to now we have developed 13 categories for the following signatures:

Electron:

In the decay \(b \to X e^- (b \to c e^- \bar\nu)\) the charge of the electron tags unambiguously the flavor of the B Meson.

IntermediateElectron:

In the decay \(b \to X_c X \to X e^+ (b \to c \to s e^+ \nu)\) the charge of the secondary electron (se) tags the B flavor.

Muon:

In the decay \(b \to X \mu- (b \to c \mu^- \bar\nu)\) the charge of the muon tags the flavor.

IntermediateMuon:

In the decay \(b \to X_c X \to X \mu^+ (b \to c \to s\mu^+ \nu)\) the charge of the secondary muon tags the flavor.

KinLepton:

In the decay \(b \to X \ell^- (b \to \ell^- \bar\nu)\) the charge of the lepton is the flavor signature. Here muon and electron PIDs are used.

IntermediateKinLepton:

In the decay \(b \to X_c X \to X \ell^+ (b \to c \to s \ell^+ \nu)\) the charge of the lepton is the flavor signature. Here muon and electron PIDs are used.

Kaon:

In the decay \(b \to X K^- (b \to c \to s)\) the charge of the Kaon is the searched flavour signature.

FastHadron:

In the decay \(b \to X^+ \pi^- (K^-)\) the charge of the pion (Kaon) tags the flavor of the B Meson.

SlowPion:

In the decay \(b \to X D^{*+} \to X D^0 \pi^+ (b \to c )\) the charge of the slow pion tags the flavor.

MaximumP*:

Here the particle with the highest CMS momentum is assumed to be a primary daughter of the B. Therefore, its charge is considered as flavor signature.

KaonPion:

In the decay \(b \to X D^{*+} \to X \pi^+ D^0 \to X K^- \pi^+ (b \to c \to s)\) the charges of the Kaon and the slow pion provide a combined flavour signature.

FastSlowCorrelated (FSC):

Slow pions from \(D^{*\pm}\) and high momentum primary particles, e.g. \(\overline{B^0} \to D^{*+} e^- \bar\nu \to X \pi^+ e^-\) , provide a combined flavour signature.

Lambda:

In the decay \(b \to \Lambda_c^* X \to \Lambda X \to X p \pi^- (b \to c \to s)\) the flavor of the Lambda tags the flavor of the B. For this, a proton and a pion are reconstructed to a Lambda.

In the following the particles providing the flavor tag information, i.e. the flavor signatures, are denoted as target.

Below: Simple draft (no physical magnitudes) to illustrate the different decays providing the signatures belonging to the different categories.

../../_images/newFlavorTaggerCategories.png

Fig. 7.2 Underlying decay modes of the flavor tagging categories.#

Note

Decays with intermediate resonances that provide flavor information are correctly considered as signal. E.g., \(\bar{B_0} \to D_1^+ \to D^{*+} \to D^+ \to K_{10} \to K^{0*} \to K^-\).

The Kaon and the Intermediate Lepton categories consider mesonic and baryonic decays via \(b \to c \to s\) transitions. E.g., \(b \to \Sigma_{\bar{c}} \to \Lambda_{\bar{c}}^+ \to K^- p \pi^+ (\Lambda \ell^+ p \nu_{\ell})\).

The FastHadron category considers also intermediate resonances and single tau daughters (kinematically similar). E.g., \(b \to \tau^- (\to \rho^-) \to \pi^-, b \to \tau^- (\to K^{*-}) \to K^-\).

Flavor Tagger Algorithm#

The process of the FlavorTagger is inspired by the Flavor Tagging concept developed by Belle and BaBar. It proceeds in 2 steps or levels: EventLevel and CombinerLevel. Each step relies on trained multivariate methods. Up to now, for the official Flavor Tagger, the multivariate method used is always a FastBDT which is embedded as Plugin in the MVA package TMVAInterface.

../../_images/singleCategory.png

Fig. 7.3 The process for an example category.#

At the starting point the available information consists only of ROE Tracks, ECL and KLM clusters.

In the first step a dedicated ParticleList is created for each type of reconstructed tracks (electrons, muons, Kaons, pions and protons). The particles in each list correspond to the whole set of Rest Of Event (ROE) tracks fitted with a specific mass hypothesis. The mass hypothesis of each ParticleList corresponds to the searched flavor signature, e.g. “K+:KaonROE” is created for the categories using the information of kaons. Several flavor tagging input variables are calculated for each track. In these calculations the ECL and KLM Clusters are implicitly involved. The variables are taken as inputs for a category specific multivariate method.

For each category, an EventLevel multivariate method is assigned which is trained to give as output the probability of being the target particle of the category, providing the right flavor. This probability is called RightCategory and is calculated in the EventLevel using the flavor tagging input variables. After the EventLevel each dummy particle in each one of the categories has the RightCategory probability as extraInfo.

For the CombinerLevel the dummy particle with the highest RightCategory probability is selected as target. The product qp of charge and RightCategory probability of the target is an input value for the combiner. Just for the Lambda and Kaon category, qp is weighted among the three candidates with the highest RightCategory. It means that the combiner gets 13 inputs, each one belonging to a specific category. The multivariate method of the CombinerLevel is trained to give the qr value belonging to the tagged B-meson as output. Here q means flavor, and r is the so-called dilution factor. Currently, there are two combiner methods: one fast BDT and a multilayer-perceptron from the FANN library. In future, also new methods could be included.

../../_images/allCategories.png

Fig. 7.4 Flow of information in the flavor tagger.#

The qr value of the tagged B is saved temporary as extraInfo of the reconstructed B particle at the end of the FlavorTagger process. All FlavorTagging information (qr of the two multivariate methods and the RightCategory probabilities) saved previously as extraInfo are saved into a dedicated DataStoreObject called FlavorTagInfo . After filling the FlavorTagInfo data object, all flavor tagger extraInfos are then deleted. The user can decide which information is saved in the Ntuples. If specified, also the inputs of the combiner are saved.

For more information see BELLE2-PTHESIS-2018-003.

Using the FlavorTagger#

Adding the FlavorTagger to your analysis is very simple: an example can be found in this tutorial:

analysis/examples/tutorials/B2A801-FlavorTagger.py

At the beginning of your steering file you have to import:

import flavorTagger as ft

Do not forget to buildRestOfEvent for your B0 recoParticle before calling the flavor tagger.

If you just want to use the flavor tagger as standard user you need only:

ft.flavorTagger( particleLists=['B0:yourSignalBlist'], weightFiles='B2nunubarBGx1')

and to add the flavor_tagging variables to your nTuple as explained below. BGx1 stays for MC generated with machine Background. Only BGx1 files are provided centrally.

The current flavor tagger is trained with MC samples for the signal channel \(B^0 \to \overline{\nu}\nu\) which has no built-in CP violation. This is needed to avoid that the flavorTagger learns CP asymmetries on the tag side.

The full interface of flavorTagger() function has 10 possible arguments and it is described below.

Saving to nTuples#

The flavor tagger provides the output of the two combiners and the outputs of the 13 categories. It provides also the MC information relevant for the categories. To save this information you just have to add the predefined list ft.flavor_tagging to the variables that you use as argument for the module modularAnalysis.variablesToNtuple().

The two available combiners provide two different flavor tags which can be found in the ntuple of the output root file: FBDT_qrCombined or FANN_qrCombined. FBDT is the output of a fast boosted decision tree and FANN is the output of a multi-layer perceptron of the open source library fast artificial neural network . The default output -2 is saved for events without tracks in the ROE.

The following variable is also saved by default,

qr_MC:It is the ideal output of the flavor tagger (therefore the name) and is the target variable of the combiners. Scholastically speaking it should be called q_MC and is just the MC flavor of the tag B. But it considers if isSignal on the signal side is 1. Therefore, one can make several checks at one shot with this variable. qrMC is just the nTuple name. The variable which is saved is isRelatedRestOfEventB0Flavor.

The goal of this variable is to return the value +1(-1) for a \(B^0 (\overline{B}^0)\) on the tag side checking the MC. But technically this is not trivial at all. The variable calculation performs the following steps:

  1. Check the MC matching of \(B^0_{\rm sig}\). It means MC \(B^0_{\rm sig}\) corresponds to RECO \(B^0_{\rm sig}\). If correctly matched then:

  2. Loop over all tracks in the ROE and get for each one the related mc particle.

  3. Check all mothers (grand-, grandgrandmother, and so on) of each one of these MC particles and find out if at least one of them is a descendant of MC \(B^0_{\rm sig}\) (these events are not good neither for training nor for evaluation). The loop is broken as soon as an MC particle related to a ROE track is found to be a descendant of MC \(B^0_{\rm sig}\). If not

  4. Find the MC flavor of the neutral \(B\) particle on the tag side (\(B^0_{\rm tag}\)).

The variable has several output values. The meaning are the following:

  • -2 (+2) At least one MC particle that is related to a ROE track is found to be a descendant of MC \(B^0_{\rm sig}\): -2 (+2) means MC \(B^0_{\rm sig}\) is a \(B^0 (\overline{B}^0)\).

  • -1 (+1) Everything is correctly matched. All MC particles related to ROE tracks are not descendant of MC \(B^0_{\rm sig}\): -1 (+1) means that the MC neutral \(B\) on the tag side is a \(B^0 (\overline{B}^0)\).

  • 0 Wrongly matched \(B^0_{\rm sig}\), or correctly matched but no neutral \(B\) found on the tag side. It means, either there are no tracks in ROE, or among the MC particles (and also their ancestors) that are related to the ROE tracks no neutral \(B\) particle was found. So, B0_isSignal==1 and B0_qrMC==0 is possible, e.g. for \(B\to\) final state with only photons, \(B\to\) invisible, \(B\to\) photons and few tracks but the tracks outside of the acceptance (or not reconstructed), etc. Very rare things could also happen like there is no related MC particle for the tracks in the ROE. This means that one should use abs(B0_qrMC) == 1, if one wants to filter out good events for evaluation. But one should be very careful, e.g. for some signal channels the MC matching does not work well at all and one could think for an instant that the flavor tagger is under or overestimating the dilution.

The flavor tagger also saves the variable mcFlavorOfOtherB which returns the flavor of the accompanying tag-side \(B\) meson (positive or negative) if the given particle is a correctly MC-matched \(B\) (it returns 0 else). In other words, this variable checks the generated flavor of the other MC \(\Upsilon(4{\rm S})\) daughter without considering the ROE particles.

The additional information about individual categories are saved using the aliases qpCategory<Name>, where <name> is the category. These are 13 values which correspond to the 13 inputs which are given to the combiners. They are actually not qr but qp where p is the output of the category level mva (FBDT) for the track with the highest target probability. In case of Kaons and Lambdas, it is the weighted qp of the 3 most probable targets.

By definition,

r = TMath::Abs(2 * prob - 1)

where prob is the probability that this event is, for example, a semileptonic event for the Electron category. Technically, it is just the output of the category level FBDT for the target track. The target track is the track with the highest track probability, which is the output of track level FBDT.

When the flavor tagger started to be developed, qr was used for each category as input. But then it turned out that qp is more powerful. The names of the variables remained the same just for practical use.

hasTrueTargetCategory<Name>: These variables tell you if you have the target of a specific category for each event. For example, \(B^0\to e^+ \nu X^-\) is the decay corresponding to the electron category. This variable returns 1 if there is an \(e^+\) which is a primary daughter of the \(B^0_{\rm tag}\) by checking the MC information. 0 else. Similar for the other categories.

The standard flavor tagger combines all 13 tags of all 13 categories for each event. hasTrueTargetCategory<Name> only tells you which categories were right or not (with exceptions for kaons).

Efficiency Calculation and Validation Plots#

If you want to calculate the efficiency of the FlavorTagger on your own File and produce qr plots, use the script analysis/release-validation/CPVTools/flavorTaggerEfficiency.py giving your file and the ntuple tree name as arguments:

basf2 flavorTaggerEfficiency.py 'YourFiles*WithWildcards??.root' Youtreename

Tutorials#

An example tutorial for normal use can be found under:

analysis/examples/tutorials/B2A801-FlavorTagger.py

Find the latest tutorial given at the 2nd OPEN Belle II physics week at GitLab.

Try the advanced tutorial B2T_Advanced_3_FlavorTagger.ipynb (Jupyter notebook) under the latest b2-starter-kit tutorials.

As further examples you can have a look on the scripts used to generate the weight files at kekcc once a release is tagged. You find them under:

analysis/release-validation/CPVTools/

You can train and test the flavor tagger, and evaluate its performance by yourself running:

sh CPVToolsValidatorInParalell.sh Belle2 nunubar nunubar BGx1 yourPathForWeightFiles yourPathForAnalyzedMdst

Note:

The convention is BGx0 for no machine background and BGx1 for MC with machine background. The process is defined in:

flavorTaggerVertexingValidation.py

If you are interested in the validation of the flavor tagger, have a look at the flavortaggingvalidation repository.

Functions#

class flavorTagger.FTCategoryParameters(particleList, trackName, eventName, variableName, code)#
code#

Alias for field number 4

eventName#

Alias for field number 2

particleList#

Alias for field number 0

trackName#

Alias for field number 1

variableName#

Alias for field number 3

flavorTagger.FillParticleLists(maskName='all', categories=None, path=None)[source]#

Fills the particle Lists for all categories.

flavorTagger.add_default_FlavorTagger_aliases()[source]#

This function adds the default aliases for flavor tagging variables and defines the collection of flavor tagging variables.

flavorTagger.combinerLevel(mode='Expert', weightFiles='B2JpsiKs_mu', categories=None, variablesCombinerLevel=None, categoriesCombinationCode=None, path=None)[source]#

Samples the input data or tests the combiner according to the selected categories.

flavorTagger.combinerLevelTeacher(weightFiles='B2JpsiKs_mu', variablesCombinerLevel=None, categoriesCombinationCode=None)[source]#

Trains the combiner according to the selected categories.

flavorTagger.eventLevel(mode='Expert', weightFiles='B2JpsiKs_mu', categories=None, path=None)[source]#

Samples data for training or tests all categories all categories at event level.

flavorTagger.eventLevelTeacher(weightFiles='B2JpsiKs_mu', categories=None)[source]#

Trains all categories at event level.

flavorTagger.flavorTagger(particleLists=None, mode='Expert', weightFiles='B2nunubarBGx1', workingDirectory='.', combinerMethods=['TMVA-FBDT'], categories=['Electron', 'IntermediateElectron', 'Muon', 'IntermediateMuon', 'KinLepton', 'IntermediateKinLepton', 'Kaon', 'SlowPion', 'FastHadron', 'Lambda', 'FSC', 'MaximumPstar', 'KaonPion'], maskName='FTDefaultMask', saveCategoriesInfo=True, useOnlyLocalWeightFiles=False, downloadFromDatabaseIfNotFound=False, uploadToDatabaseAfterTraining=False, samplerFileId='', prefix='MC15ri_light-2207-bengal_0', useGNN=False, identifierGNN='GFlaT_MC15ri_light_2303_iriomote_0', path=None)[source]#

Defines the whole flavor tagging process for each selected Rest of Event (ROE) built in the steering file. The flavor is predicted by Multivariate Methods trained with Variables and MetaVariables which use Tracks, ECL- and KLMClusters from the corresponding RestOfEvent dataobject. This module can be used to sample the training information, to train and/or to test the flavorTagger.

Parameters
  • particleLists – The ROEs for flavor tagging are selected from the given particle lists.

  • mode – The available modes are Expert (default), Sampler, and Teacher. In the Expert mode Flavor Tagging is applied to the analysis,. In the Sampler mode you save save the variables for training. In the Teacher mode the FlavorTagger is trained, for this step you do not reconstruct any particle or do any analysis, you just run the flavorTagger alone.

  • weightFiles – Weight files name. Default= B2nunubarBGx1 (official weight files). If the user self wants to train the FlavorTagger, the weightfiles name should correspond to the analysed CP channel in order to avoid confusions. The default name B2nunubarBGx1 corresponds to \(B^0_{\rm sig}\to \nu \overline{\nu}\). and B2JpsiKs_muBGx1 to \(B^0_{\rm sig}\to J/\psi (\to \mu^+ \mu^-) K_s (\to \pi^+ \pi^-)\). BGx1 stays for events simulated with background.

  • workingDirectory – Path to the directory containing the FlavorTagging/ folder.

  • combinerMethods – MVAs for the combiner: TMVA-FBDT` (default). ``FANN-MLP is available only with prefix='' (MC13 weight files).

  • categories – Categories used for flavor tagging. By default all are used.

  • maskName

    Gets ROE particles from a specified ROE mask. FTDefaultMask (default): tentative mask definition that will be created automatically. The definition is as follows:

    • Track (pion): thetaInCDCAcceptance and dr<1 and abs(dz)<3

    • ECL-cluster (gamma): thetaInCDCAcceptance and clusterNHits>1.5 and [[clusterReg==1 and E>0.08] or [clusterReg==2 and E>0.03] or [clusterReg==3 and E>0.06]] (Same as gamma:pi0eff30_May2020 and gamma:pi0eff40_May2020)

    all: all ROE particles are used. Or one can give any mask name defined before calling this function.

  • saveCategoriesInfo – Sets to save information of individual categories.

  • useOnlyLocalWeightFiles – [Expert] Uses only locally saved weight files.

  • downloadFromDatabaseIfNotFound – [Expert] Weight files are downloaded from the conditions database if not available in workingDirectory.

  • uploadToDatabaseAfterTraining – [Expert] For librarians only: uploads weight files to localdb after training.

  • samplerFileId – Identifier to parallelize sampling. Only used in Sampler mode. If you are training by yourself and want to parallelize the sampling, you can run several sampling scripts in parallel. By changing this parameter you will not overwrite an older sample.

  • prefix – Prefix of weight files. MC15ri_light-2207-bengal_0 (default): Weight files trained for MC15ri samples. '': Weight files trained for MC13 samples.

  • useGNN – Use GNN-based Flavor Tagger in addition with FastBDT-based one. Please specify the weight file with the option identifierGNN. [Expert] In the sampler mode, training files for GNN-based Flavor Tagger is produced.

  • identifierGNN – The name of weight file of the GNN-based Flavor Tagger. [Expert] Multiple identifiers can be given with list(str).

  • path – Modules are added to this path

flavorTagger.getBelleOrBelle2()[source]#

Gets the global ModeCode.

flavorTagger.getEventLevelParticleLists(categories=None)[source]#
flavorTagger.getFastBDTCategories()[source]#

Helper function for getting the FastBDT categories. It’s necessary for removing top-level ROOT imports.

flavorTagger.getFastBDTCombiner()[source]#

Helper function for getting the FastBDT combiner. It’s necessary for removing top-level ROOT imports.

flavorTagger.getMlpFANNCombiner()[source]#

Helper function for getting the MLP FANN combiner. It’s necessary for removing top-level ROOT imports.

flavorTagger.getTrainingVariables(category=None)[source]#

Helper function to get training variables.

NOTE: This function is not called the Expert mode. It is not necessary to be consistent with variables list of weight files.

flavorTagger.setInputVariablesWithMask(maskName='all')[source]#

Set aliases for input variables with ROE mask.

flavorTagger.setInteractionWithDatabase(downloadFromDatabaseIfNotFound=False, uploadToDatabaseAfterTraining=False)[source]#

Sets the interaction with the database: download trained weight files or upload weight files after training.

flavorTagger.set_FlavorTagger_pid_aliases()[source]#

This function adds the pid aliases needed by the flavor tagger.

flavorTagger.set_FlavorTagger_pid_aliases_legacy()[source]#

This function adds the pid aliases needed by the flavor tagger trained for MC13.

flavorTagger.set_GNNFlavorTagger_aliases(categories)[source]#

This function adds aliases for the GNN-based flavor tagger.