SectorMap training#
The SectorMap needs to be trained on MC events. During this training the friendship relations between sectors and the allowed ranges for the applied filter variables are learned. The training procedure can be very roughly separated into four steps:
collection of trainings data: Large MC samples are generated and relevant data to train the SectorMap are written to disk.
SectorMap training: The actual training of the SectorMap. For this the data created in the previous step is read and friendship relations and filter ranges are estimated (“trained”)
finalization of the SectorMap: The SectorMap from the previous step is usable as it is. But we provide SectorMaps with and without SVD time cuts applied. Therefore a SectorMap with timing cuts removed has to be generated. In addition the two SectorMaps are brought into the correct format to be uploaded to the data base.
validation of the SectorMap: Run the tracking validation with the new SectorMap to validate it.
The scripts used for the SectorMap training are collected in tracking/scripts/tracking/secMapTraining/ and tracking/tools/ their usage will be outlined in the following sections.
Collection of trainings data#
For the SectorMap training a large amount trainings data needs to be generated. By default we generate 10 Mio BB events and an additional 2 Mio Bhabha events for the training. Both event types have additional muon tracks mixed in using the ParticleGun
event generator. To make the SectorMap more robust against displaced IP - positions the IP position is randomly varied for each generated sub-sample. The concrete setup of the event generators is done by the function tracking.secMapTraining.SectorMapTrainingUtils.add_event_generation
. We train on MC - track candidates only, so adding beam background or a change in the beam background simulation should have no effect on the training. Thus all samples generated do not include beam background.
Please note that there is a special reconstruction chain applied for this MC generation. When using the commands described below this chain is set up automatically. If you need to know or adjust this chain you can look up the setting in the following function tracking.secMapTraining.SectorMapTrainingUtils.add_simulation_and_reconstruction_modules
.
By default 2500 sub-samples of BB with 4000 events each and 500 sub-samples of Bhabha with 4000 events each are generated. To generate a single sub-sample you can run the following command:
tracking-vxdtf2-collect-train-data --rndSeed 12345 --eventType BBbar --outputDir output/
There are several command line parameters (in addition to those shown in the example) which can be used to customize the data generation. For most of them the default values should be sufficient. For a full list of available parameter you can run the following command in a shell:
tracking-vxdtf2-collect-train-data --help
Please note that the training with PXD data (off by default) has not been maintained since some time. So if you activate this feature this may lead to strange results. Also training with PXD data requires the addition of a new SectorMap-config into the SectorMapBootstrap
module by changing the code. If you intend to train a SectorMap with PXD data please contact the tracking group.
The output files generated are named after the SectorMap-config used for the training as implemented into the SectorMapBootstrap
module and type of dataset generated. At the moment there is only one config implemented which is called SVDOnlyDefault. Consequently all output files are named similar to SVDOnlyDefault_BBbar24956SVDOnly.root where the number corresponds to the random seed used during the generation and the name of the dataset is included (BBbar or BhaBha). By default the generated events are discarded and only the information needed for the SectorMap training is written out. The full root output can be activated by command line parameter, but note that this will take more than 1TB of disk space if a default SectorMap is trained (12Mio events).
An additional script is provided which sends all jobs to a queue to generate the full training sample. This script assumes that the attached queue is a LSF-queue (e.g. at KEKcc). If you don’t have access to a LSF-queue you would need to adopt the scripts or kook up your own recipe. The default usage of the script for submitting the jobs is as follows:
tracking-vxdtf2-submitAllCollectionJobs --outputDir OUTPUTDIR
This will send the default number of jobs (2500 + 500) and events per job (4000) to a queue called l. This will result in a total of 12Mio events generated. The default parameter will work out of the box at KEKcc, except for the output directory which you should create and set. The output of running this command with default settings will use of the order of O(100GB) of disk space. So make sure to direct the output to a disk with sufficient quota. Internally the tracking-vxdtf2-collect-train-data command is called with settings used for the default SectorMap training. There are several command line parameter to customize the functionality of the script (e.g. number of jobs, name of the queue, number of events per job, …). A full list of available options can be obtained by running the following command:
tracking-vxdtf2-submitAllCollectionJobs --help
Note, for SectorMaps applied on data (the real one) the displacement of sensors should be included in the training. By default the MC has a perfect geometry which is not the case for the actual geometry in Belle II. Though differences are small, they do have an effect on the performance of the trained SectorMap. The default way of including a displaced geometry in the training is to provide a global tag with the displaced SVD-geometry (assuming PXD is ignored). This can be done by providing the name of a global tag containing such a geometry. Both of the above commands (tracking-vxdtf2-submitAllCollectionJobs and tracking-vxdtf2-collect-train-data) provide the option --prependGT which can be used to provide the name of such a global tag. The given global will be prepend to the other global tags.
The script tracking/scripts/tracking/secMapTraining/CreateSensorDisplacements.py
can be used to create xml files containing sensor displacements which can be used
to create a geometry with displaced (w.r.t. the ideal geometry) sensors. You have to provide global tags
which contain the geometry payload and the payloads containing the relevant alignment data (named VXDAlignment*
). The payloads can be either
provided as list of global tags using the --listOfGT
parameter where you can give a list of global tags separated by spaces. Alternatively
one can provide the payloads as local database using the --localDB
option. Note that the IOV of the alignment data and the geometry has to
match. The experiment number and run number can be specified using command line parameters. An example is given below
basf2 CreateSensorDisplacements.py -- --help
basf2 CreateSensorDisplacements.py -- --localDB database.txt --listOfGT globaltag1 globaltag2 globaltag2 --expNum 0 --runNum 0
, where the first command will display all available command line parameter. Note the leading --
is needed.
The output will be two xml files containing the displacement for SVD (SVD-Alignment.xml
) and PXD
(PXD-Alignment.xml
). The geometry in xml format is
stored in svd/data
and pxd/data
, respectively. These already contain afore mentioned files, which should be replaced with the newly created versions.
It is recommended to not use the original folders svd/data
and pxd/data
to not mess around with your release. Instead you should either create a copy
of those folders or work on a branch. Once you replaced the corresponding files you can use the Geometry
module to create the actual payloads.
For the Geometry
module you need to set the options createPayloads
to True
and you should set the components
parameter correspondingly.
If you use a copy of the original xml files you may also need to set the fileName
parameter in the Gearbox
module.
An example for the SVD geometry is given in the script svd/scripts/dbImporters/create_SVDGeometryPar_payload.py
. The output
will be the geometry payloads which can be uploaded to the database or used as local database.
SectorMap training#
Given the training data generated in the previous step the SectorMap can now be trained. This is done by the command tracking-vxdtf2-train-SectorMap. You need to specify the input training dataset (see previous step). The usual way to call that command is the following:
tracking-vxdtf2-train-SectorMap --train_sample output/SVDOnlyDefault_*.root
Please note that you can and should use wild cards to specify the trainings sample. The above example will train on all root files located in the directory called output starting with SVDOnlyDefault. Setting the input training sample is the only non-optional parameter. A full list of available parameters can be obtained by running the command:
tracking-vxdtf2-train-SectorMap --help
Also note that the training process takes very long ( O(24h) ) and uses a large amount of resources ( O(32GB) RAM) for the default settings. From past experience no queue at KEKcc was found which can handle these requirements. Therefore past SectorMaps have been trained interactively on one of the KEKcc worker nodes. If it is not an option to keep an ssh connection open for 24 hours you should check the linux command nohup which lets you run the training in the background while still allowing you to log off without the command being stopped on log off.
The paramter called --threshold is used to prune the SectorMap. It is given in per cent and will remove 70% of the least used Sector connections. The value of 70% was optimized and provided the best results in terms of efficiency and fake rate. So it is advised to keep the default value for this parameter, unless you know what you are doing.
Also for training you can specify a global tag to be prepend. At this stage only the numbering of sensors is read from the geometry.
Note that the command will cause a B2FATAL
if the output SectorMap already exists to prevent overwriting old SectorMaps by accident. To fix this either move the old SectorMap or set a different output name by using the --secmap parameter.
The output of this procedure is a SectorMap which already can be used for processing. By default this SectorMap contains cuts on SVD timing information. How to obtain a SectorMap without timing is shown in the following section.
Finalization of the SectorMap#
By default a SectorMap is always trained including timing information, but tracking usually provides SectorMap with and without timing information. The timing information is removed by copying the original SectorMap and then setting the corresponding ranges for the filters using time to (-INF ,+INF)
for the copied SectorMap.
In addition both SectorMaps, with and without timing, need to be brought into a format that is expected by the database.
Both of these tasks are performed by executing the following command:
tracking-vxdtf2-prepare-SectorMap --inputSectorMap NameInputSectorMap.root
The only parameter is the name of the input SectorMap (including directory). The generated output will be a new SectorMap with timing information removed. The new SectorMap (without timing) is named after the input SectorMap by adding the postfix _timingRemoved. This SectorMap uses usually a bit less disk space (around 16 vs. 17MB).
The local database with both SectorMap payloads will be created in your current folder in a directory named localSectorMapDB. Note that the IOVs specified in the file database.txt
contained in that folder use dummy values. These MUST be adjusted before you upload this to the database. After adjusting the IOVs you can upload the SectorMaps to the database using the b2conditionsdb: Conditions DB interface command.
Validation of the SectorMap#
The easiest method to validate a SectorMap is to use the tracking validation scripts tracking/validation/VXDTF2TrackingValidation.py
or tracking/validation/VXDTF2TrackingValidationBkg.py
, where the latter uses input files with beam bkg included.
By default SectorMaps are read from the database. So you need to tell those scripts that they should use the SectorMap you want to test. The easiest way to achieve that is to set the linux environment variable BELLE2_TESTING_VXDTF2_SECMAP
. The value should be set to name of the file of the SectorMap you want to test (including path).
export BELLE2_TESTING_VXDTF2_SECMAP="myPath/MySectorMap.root"
echo $BELLE2_TESTING_VXDTF2_SECMAP
Once this variable is set you can simply run either of the two above given validation scripts and check their output to validate the SectorMap. The validation can be run automatically with either of the following two shell commands (after basf2 is set up):
b2validation -s VXDTF2TrackingValidation.py
b2validation -s VXDTF2TrackingValidationBkg.py
These commands will automatically generate input samples and run the corresponding validation script on those. The output can be found under the results/
directory. Note that if you trained a SectorMap on an altered Geometry you need to make sure the input samples also need to be generated with this Geometry. This means you would need to generate the input samples “by hand”, as the above method will use the default Geometry.
Note this method only works for these specific scripts. If you need to set the SectorMap in your custom code you need to adjust the settings of the SectorMapBootstrap
module. To achieve that put the following code after the tracking chain has been set up:
import basf2
basf2.set_module_parameters(path, "SectorMapBootstrap", ReadSecMapFromDB=False)
basf2.set_module_parameters(path, "SectorMapBootstrap", ReadSectorMap=True)
basf2.set_module_parameters(path, "SectorMapBootstrap", SectorMapsInputFile="myPath/mySectorMap.root")
Alternatively one can also prepend a global tag which contains the new SectorMap:
import basf2
basf2.conditions.prepend_globaltag("MyGlobalTagName")