PID Prior Probabilities

7.8.10. PID Prior Probabilities#

This tool uses the momentum and cos( $θ$ ) of particles to train a machine learning model to calculate prior probabilities which are particle identification probabilities before taking into account the detector signals. Combining this with PID’s from detector signals gives posterior probabilities which can help improve particle identification.

Training#

This tool uses a root file containing momentum and cos(theta) (theta is the angle made by momentum vector with the beam axis) and trains a pytorch model by taking a second order combination of these variables along with the transverse momentum (thus giving 9 total features which is done by the data loader). The program then outputs a ‘.pth’ model file along with a scaling file (if the path is provided and if not, this step is skipped). The required input is taken by the use of appropriate flags when using the program and these are listed below.

Example:

analysis-train-priors -i /path/to/datafile.root -o /path/to/model.pth -p 11 13 211 321 -k tree -v cosTheta p mcPDG
-t 20 -lr 1e-6 -r /path/to/alreadyTrainedModel.pth -s /path/to/scalefile.root -e 10

usage: analysis-train-priors [-h] -i INPUT_PATH -p PARTICLE_LIST
                             [PARTICLE_LIST ...] -o OUTPUT_PATH -k KEY
                             [-r RETRAIN_MODEL]
                             [-v VARIABLE_LIST VARIABLE_LIST VARIABLE_LIST]
                             [-lr LEARNING_RATE] [-e EPOCHS] [-t TAYLOR_TERMS]
                             [-s SCALING_FILE]

options#

-r, --retrain_model: Path to the model for retraining.
-v, --variable_list: List of variable names in order cos(theta), momentum and pdg (Default: cosTheta p mcPDG).
-lr, --learning_rate: Learning rate for training (Default: 1e-5).
-e, --epochs: Number of epochs to be trained for (Default: 64).
-t, --taylor_terms: Number of terms for Taylor series of cross entropy loss (Default: 0 (takes log loss instead of taylor series)).
-s, --scaling_file: Path to the root file to write data for scaling.

Required Arguments#

-i, --input_path: Path to ntuples conataining data for training.
-p, --particle_list: List of particle mcPDG for which priors are to be calculated.
-o, --output_path: Output model file name (with path).
-k, --key: Key of the tree to be used for training in the root file.

Note

The output during the training process will show a smaller value for validation loss but this is just because the loss on training set includes regularization.
In case you notice overfitting or want to stop the training, Ctrl+C will stop the training and create the required output files so in case it takes some time to exit out, it may be because it is writing those files.

Warning

The tool does not consider the particles with PDG values other than the ones given in particle_list (even as background) during training so it is advisable to include particles with sufficiently large proportions in the particle_list even if their priors are not required.
In case your particle_list contains only two particles, kindly avoid creation of scaling file because TemperatureScaling which is used for calibration of priors will give out only the output for the higher PDG value and so further functions like posterior calculation (as well as getting priors for a specific PDG value) will not work.

Evaluation#

For evaluation we have a class called Priors within EvalPriors. The user needs to first initialize it using the trained model, the particlelist for which the model was trained and a scaling file (if created). Then using relevant input, we can get priors as well as posteriors.

Import#

from evalPriors import Priors

Initialize#

prior = Priors(particle_list,model,scalefile)

Here, particle_list is a list of the form [11,13,211,321] and model and scalefile are paths to the model and scaling data file respectively which are of the form '/path/model_name.pth' and '/path/scalefile.root' respectively. However, the use of scaling file for calibration is optional.

Note

To use the scaling file for calibration, you are required to additionally install the netcal module using pip.

Prior Calculation#

prior.calculate_priors(momentum,cosTheta)

Here, momentum and cosTheta are numpy arrays.

Getting the Priors#

prior.get_priors(pdg)

This returns a 1D array of calculated priors for the given pdg value. However, this is an optional argument and specifying nothing returns a 2D array of priors arranged in ascending order of PDG values.

Getting the Posteriors#

prior.get_posterior(pid,pdg)

Again pdg is an optional argument but pid likelihoods must be provided as 2D array containing likelihoods for the particles in particle_list with PDG values taken in ascending order.