Simple 1D Fit with Bat
Contents
12.5.4. Simple 1D Fit with Bat#
This directory contains two examples of how to use BAT for a 1D fit. To run this code, you will need to install BAT. See the intructions in Installation instructions for how install BAT.
The template for creating these examples was generated by BAT via a command-line tool. To generate for one model for your own use, run, for example,
bat-project Simple1DBinnedFit Simple1DBinnedfit
This creates the directory Simple1DBinnedFit
; the header and
source files for a model classes Simple1DBinnedfit
; the source
code for a program to create and run the model,
runSimple1DBinnedFit.cxx
; and the Makefile
for compiling everything.
To compile the code in this example, you will need to have BAT installed; and run
make
The Data#
The data to be analyzed in this example is stored in a file in a
directory above this one at: ../../example-data/fitme.root
. In
this ROOT file is a TTree
(“BtoKstG”) with a branch named
“B0_mbc”, that contains the beam-constrained masses of reconstructed
B0 mesons. We will perform two fits to this data: one binned and one
unbinned.
The Model#
We model the background as being distributed according to an ARGUS function
where
We model the signal as being distributed according to a Crystal Ball function
where
Both distributions are normalized over the range of the data.
These functions and their integral functions are coded in Functions.h
Binned Fit#
Our model in the binned fit is that the likelihood for the data set is factorized into likelihoods for each bin:
where
This bit of math is coded into the function
double Simple1DBinnedfit::LogLikelihood(const std::vector<double>& P)
We need to define parameters for our model for each of the components. This is done in the constructor:
// Add signal shape parameters
AddParameter("sig_mode", ...);
AddParameter("sig_sigma", ...);
AddParameter("sig_alpha", ...);
AddParameter("sig_n", ...);
// Add background shape parameters
AddParameter("bkg_x0", ...);
AddParameter("bkg_chi", ...);
// Add signal and background yields
AddParameter("Y_S", ...);
AddParameter("Y_B", ...);
Where for each, we have chosen an appropriate range and given pretty-print names (and units) for plotting.
We also must choose a prior for each parameter. For the moment we take all priors to be constant within their allowed ranges:
SetPriorConstantAll();
Our
AddObservable("sig_tail_pos", ...);
And override a function in BCModel
that lets us store its distribution:
void Simple1DBinnedFit::CalculateObservables(const std::vector<double>& P)
In runSimple1DBinnedfit.cxx
we load our data into a ROOT TH1D
that we pass to our model upon its creation:
Simple1DBinnedfit m("Simple1DBinnedFit", h1_data);
Before running our analysis, we set a desired precision:
m.SetPrecision(BCEngineMCMC::kMedium);
Then we run the analysis, first sampling from our distribution and then using Minuit to refine the best-fit value:
m.MarginzalizeAll();
m.FindMode();
Finall we print the results to a pdf file and to the screen and log file:
m.PrintAllMarginalized(...);
m.PrintSummary();
We then use some ROOT code (stored in DrawFit.h
) to print a
projection of the fit to Simple1DBinnedfit_fit.pdf
.
After compiling, to run the code, execute
./runSimple1DBinnedFit
The output#
The program produces output to the screen (and log file) that tells you the global mode of the posterior (the “best fit point”), with uncertainties on the values around that point. It will also writes out information about each of the marginalized distributions: the mean and standard deviation, the median and central one-sigma region, and quantiles.
The visual output (in Simple1DBinnedFit_plots.pdf
) shows the
marginalized posterior distributions for each of the parameters (and
the observable) and each of the 2D combinations thereof. In each plot,
the green region shows the one-sigma credibility interval; the yellow
region, the two-sigma interval; and the red, the three-sigma
interval. There are markers for the global mode, which you should note
is often not at the local mode; and the mean and standard deviation.
Also, note that the posterior distributions for many of the parameters
are not normal in their entire range; thought they tend to be normal
around their mode. In particular, the posterior for the
One of the great advantages of inspecting the posterior probability via BAT is that you can see the complicated correlations among your parameters in a way that you can not achieve with a fitter like Minuit.
In our model, we can see a complicated correlation between the
Please try extending the range of
Let us try extending our example by supposing that we have performed a test fit to simulated data and have expectations for what our shape parameters should be. This is quite common in analyses. Typically one fixes the parameters to these values and then calculates a systematic uncertainty by observing the effect of moving them plus and minus one sigma away from their expectations.
In BAT we will robustly incorporate this systematic uncertainty into our fit by using that information from fits to simulated data as priors for our parameters.
Uncomment the code block in runSimple1DBinnedfit.cxx
that sets
priors for the parameters:
m.GetParameter("bg_x0").SetPrior(new BCGaussianPrior(5.289, 0.3e-3));
m.GetParameter("bg_chi").SetPrior(new BCGaussianPrior(9.8, 0.3));
m.GetParameter("sig_mode").SetPrior(new BCGaussianPrior(5.279, 1e-3));
m.GetParameter("sig_sigma").SetPrior(new BCGaussianPrior(3.5e-3, 0.1e-3));
m.GetParameter("sig_alpha").SetPrior(new BCGaussianPrior(-1.3, 0.15));
m.GetParameter("sig_n").SetPrior(new BCGaussianPrior(2.5, 0.5));
m.GetParameter("sig_n").SetUpperLimit(10);
Note that in the last line we shrink the allowed range for
In order to compile this code, we must also include
BCGaussianPrior
. Uncomment the include statement at the top of the
file:
#include <BAT/BCGaussianPrior.h>
Since we are now defining priors, we also want to look at that the change from the prior to the posterior. Uncomment the line that prints this knowledge-update information:
m.PrintKnowledgeUpdatePlots(...);
Looking at the output plots again, you should see that they are now much better behaving: they are nearly normal for all parameters with no strange 2D correlations.
Looking at the file produced by the last uncommented line,
Simple1DBinnedfit_update.pdf
, you can see the change from the
prior distribution to the posterior distribution. You’ll notice that
some of our priors—based on fits to simulated data—do not match to
our posteriors. Yet still, our model is not so strongly influenced by
the prior that it cannot find the correct value. But by placing a
prior on
Note also that BAT has calculated what the prior on
Unbinned Fit#
The unbinned fit uses the same model as the binned fit, but, being
unbinned, does not need the yield parameters Y_S
and
Y_B
. Instead it needs a signal fraction parameter:
AddParameter("f", ...);
The log-likelihood is also different and is a sum of logs of the fit
function for each data point with the signal function weighted by