PID Prior Probabilities
Contents
7.8.10. PID Prior Probabilities#
This tool uses the momentum and cos(\(\theta\)) of particles to train a machine learning model to calculate prior probabilities which are particle identification probabilities before taking into account the detector signals. Combining this with PID’s from detector signals gives posterior probabilities which can help improve particle identification.
Training#
This tool uses a root file containing momentum and cos(theta) (theta is the angle made by momentum vector with the beam axis) and trains a pytorch model by taking a second order combination of these variables along with the transverse momentum (thus giving 9 total features which is done by the data loader). The program then outputs a ‘.pth’ model file along with a scaling file (if the path is provided and if not, this step is skipped). The required input is taken by the use of appropriate flags when using the program and these are listed below.
- Example:
analysis-train-priors -i /path/to/datafile.root -o /path/to/model.pth -p 11 13 211 321 -k tree -v cosTheta p mcPDG -t 20 -lr 1e-6 -r /path/to/alreadyTrainedModel.pth -s /path/to/scalefile.root -e 10
usage: analysis-train-priors [-h] -i INPUT_PATH -p PARTICLE_LIST
[PARTICLE_LIST ...] -o OUTPUT_PATH -k KEY
[-r RETRAIN_MODEL]
[-v VARIABLE_LIST VARIABLE_LIST VARIABLE_LIST]
[-lr LEARNING_RATE] [-e EPOCHS] [-t TAYLOR_TERMS]
[-s SCALING_FILE]
Optional Arguments#
- -r, --retrain_model
Path to the model for retraining.
- -v, --variable_list
List of variable names in order cos(theta), momentum and pdg (Default: cosTheta p mcPDG).
- -lr, --learning_rate
Learning rate for training (Default: 1e-5).
- -e, --epochs
Number of epochs to be trained for (Default: 64).
- -t, --taylor_terms
Number of terms for Taylor series of cross entropy loss (Default: 0 (takes log loss instead of taylor series)).
- -s, --scaling_file
Path to the root file to write data for scaling.
Required Arguments#
- -i, --input_path
Path to ntuples conataining data for training.
- -p, --particle_list
List of particle mcPDG for which priors are to be calculated.
- -o, --output_path
Output model file name (with path).
- -k, --key
Key of the tree to be used for training in the root file.
Note
The output during the training process will show a smaller value for validation loss but this is just because the loss on training set includes regularization.
In case you notice overfitting or want to stop the training, Ctrl+C will stop the training and create the required output files so in case it takes some time to exit out, it may be because it is writing those files.
Warning
The tool does not consider the particles with PDG values other than the ones given in
particle_list
(even as background) during training so it is advisable to include particles with sufficiently large proportions in theparticle_list
even if their priors are not required.In case your
particle_list
contains only two particles, kindly avoid creation of scaling file becauseTemperatureScaling
which is used for calibration of priors will give out only the output for the higher PDG value and so further functions like posterior calculation (as well as getting priors for a specific PDG value) will not work.
Evaluation#
For evaluation we have a class called Priors within EvalPriors. The user needs to first initialize it using the trained model, the particlelist for which the model was trained and a scaling file (if created). Then using relevant input, we can get priors as well as posteriors.
Import#
from evalPriors import Priors
Initialize#
prior = Priors(particle_list,model,scalefile)
Here, particle_list
is a list of the form [11,13,211,321]
and
model
and scalefile
are paths to the model and scaling data file
respectively which are of the form '/path/model_name.pth'
and
'/path/scalefile.root'
respectively. However, the use of scaling file
for calibration is optional.
Note
To use the scaling file for calibration, you are required to additionally install the netcal module using pip.
Prior Calculation#
prior.calculate_priors(momentum,cosTheta)
Here, momentum
and cosTheta
are numpy arrays.
Getting the Priors#
prior.get_priors(pdg)
This returns a 1D array of calculated priors for the given pdg
value.
However, this is an optional argument and specifying nothing returns a
2D array of priors arranged in ascending order of PDG values.
Getting the Posteriors#
prior.get_posterior(pid,pdg)
Again pdg
is an optional argument but pid
likelihoods must be provided
as 2D array containing likelihoods for the particles in
particle_list
with PDG values taken in ascending order.