10combined_to_pxd_ckf_mva_training
11-----------------------------------------
16This python script is used for the training and validation of the classifiers of
17the three MVA-based state filters and one result filter of the ToPXDCKF.
18This CKF extraplates tracks found in CDC and SVD into the PXD and adds PXD hits
19using a combinatorial tree search and a Kalman filter based track fit in each step.
21To avoid mistakes, b2luigi is used to create a task chain for a combined training and
22validation of all classifiers.
24The order of the b2luigi tasks in this script is as follows (top to bottom):
25* Two tasks to create input samples for training and testing (``GenerateSimTask`` and
26``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
27generated and a number of events per task to reduce runtime. It then divides the total
28number of events by the number of events per task and creates as ``GenerateSimTask`` as
29needed, each with a specific random seed, so that in the end the total number of
30training and testing events are simulated. The individual files are then combined
31by the SplitNMergeSimTask into one file each for training and testing.
32* The ``StateRecordingTask`` writes out the data required for training the state
34* The ``CKFStateFilterTeacherTask`` trains the state filter MVAs, using FastBDT by
35default, with a given set of options.
36* The ``ResultRecordingTask`` writes out the data used for the training of the result
37filter MVA. This task requires that the state filters have been trained before.
38* The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
39given set of FastBDT options. This requires that the result filter records have
40been created with the ``ResultRecordingTask``.
41* The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
42provided to run the tracking chain with the weight file under test, and also
43runs the tracking validation.
44* Finally, the ``SummaryTask`` is the "brain" of the script. It invokes the
45``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
46and cut values on the MVA classifier output.
48Due to the dependencies, the calls of the task are reversed. The SummaryTask
49calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
50values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
51training, and simulation tasks.
53Each combination of FastBDT options and state filter cut values and candidate selection
54is used to train the result filter, which includes that the ``ResultRecordingTask``
55is executed multiple times with different combinations of FastBDT options and cut value
56and candidate selection.
58b2luigi: Understanding the steering file
59~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61All trainings and validations are done in the correct order in this steering
62file. For the purpose of creating a dependency graph, the `b2luigi
63<https://b2luigi.readthedocs.io>`_ python package is used, which extends the
64`luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
66Each task that has to be done is represented by a special class, which defines
67which defines parameters, output files and which other tasks with which
68parameters it depends on. For example a teacher task, which runs
69``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
70task which runs a reconstruction and writes out track-wise variables into a root
71file for training. An evaluation/validation task for testing the classifier
72requires both the teacher task, as it needs the weightfile to be present, and
73also a data collection task, because it needs a dataset for testing classifier.
75The final task that defines which tasks need to be done for the steering file to
76finish is the ``SummaryTask``. When you only want to run parts of the
77training/validation pipeline, you can comment out requirements in the Master
78task or replace them by lower-level tasks during debugging.
83This steering file relies on b2luigi_ for task scheduling. It can be installed
86 python3 -m pip install [--user] b2luigi
88Use the ``--user`` option if you have not rights to install python packages into
89your externals (e.g. because you are using cvmfs) and install them in
90``$HOME/.local`` instead.
95Instead of command line arguments, the b2luigi script is configured via a
96``settings.json`` file. Open it in your favorite text editor and modify it to
97fit to your requirements.
102You can test the b2luigi without running it via::
104 python3 combined_to_pxd_ckf_mva_training.py --dry-run
105 python3 combined_to_pxd_ckf_mva_training.py --show-output
107This will show the outputs and show potential errors in the definitions of the
108luigi task dependencies. To run the the steering file in normal (local) mode,
111 python3 combined_to_pxd_ckf_mva_training.py
113One can use the interactive luigi web interface via the central scheduler
114which visualizes the task graph while it is running. Therefore, the scheduler
115daemon ``luigid`` has to run in the background, which is located in
116``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
121Then, execute your steering (e.g. in another terminal) with::
123 python3 combined_to_pxd_ckf_mva_training.py --scheduler-port 8886
125To view the web interface, open your webbrowser enter into the url bar::
129If you don't run the steering file on the same machine on which you run your web
130browser, you have two options:
132 1. Run both the steering file and ``luigid`` remotely and use
133 ssh-port-forwarding to your local host. Therefore, run on your local
136 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
138 2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
139 local host>`` argument when calling the steering file
141Accessing the results / output files
142~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144All output files are stored in a directory structure in the ``result_path`` set in
145``settings.json``. The directory tree encodes the used b2luigi parameters. This
146ensures reproducibility and makes parameter searches easy. Sometimes, it is hard to
147find the relevant output files. You can view the whole directory structure by
148running ``tree <result_path>``. Ise the unix ``find`` command to find the files
149that interest you, e.g.::
151 find <result_path> -name "*.root" # find all ROOT files
162from tracking
import add_track_finding
168from ckf_training
import my_basf2_mva_teacher, create_fbdt_option_string
169from tracking_mva_filter_payloads.write_tracking_mva_filter_payloads_to_db
import write_tracking_mva_filter_payloads_to_db
172install_helpstring_formatter = (
"\nCould not find {module} python module.Try installing it via\n"
173 " python3 -m pip install [--user] {module}\n")
176 from b2luigi.core.utils
import create_output_dirs
177 from b2luigi.basf2_helper
import Basf2PathTask, Basf2Task
178except ModuleNotFoundError:
179 print(install_helpstring_formatter.format(module=
"b2luigi"))
185 Simple task that defines the configuration of the LSF batch submission.
202 Same as LSFTask, but for memory-intensive tasks.
211 Generate simulated Monte Carlo with background overlay.
213 Make sure to use different ``random_seed`` parameters for the training data
214 format the classifier trainings and for the test data for the respective
215 evaluation/validation tasks.
219 experiment_number = b2luigi.IntParameter()
222 random_seed = b2luigi.Parameter()
224 n_events = b2luigi.IntParameter()
226 bkgfiles_dir = b2luigi.Parameter(
235 Create output file name depending on number of events and production
236 mode that is specified in the random_seed string.
238 :param n_events: Number of events to simulate.
239 :param random_seed: Random seed to use for the simulation to create independent samples.
243 if random_seed
is None:
245 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
249 Generate list of output files that the task should produce.
250 The task is considered finished if and only if the outputs all exist.
256 Create basf2 path to process with event generation and simulation.
259 path = basf2.create_path()
263 path.add_module(
"EvtGenInput")
282 Default function from base b2luigi.Task class.
284 self._remove_output()
291 Generate simulated Monte Carlo with background overlay.
293 Make sure to use different ``random_seed`` parameters for the training data
294 format the classifier trainings and for the test data for the respective
295 evaluation/validation tasks.
299 experiment_number = b2luigi.IntParameter()
302 random_seed = b2luigi.Parameter()
304 n_events = b2luigi.IntParameter()
306 bkgfiles_dir = b2luigi.Parameter(
315 Create output file name depending on number of events and production
316 mode that is specified in the random_seed string.
318 :param n_events: Number of events to simulate.
319 :param random_seed: Random seed to use for the simulation to create independent samples.
323 if random_seed
is None:
325 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
329 Generate list of output files that the task should produce.
330 The task is considered finished if and only if the outputs all exist.
336 This task requires several GenerateSimTask to be finished so that he required number of events is created.
338 n_events_per_task = SummaryTask.n_events_per_task
339 quotient, remainder = divmod(self.
n_events, n_events_per_task)
340 for i
in range(quotient):
343 num_processes=SummaryTask.num_processes,
344 random_seed=self.
random_seed +
'_' + str(i).zfill(3),
345 n_events=n_events_per_task,
351 num_processes=SummaryTask.num_processes,
352 random_seed=self.
random_seed +
'_' + str(quotient).zfill(3),
357 @b2luigi.on_temporary_files
360 When all GenerateSimTasks finished, merge the output.
362 create_output_dirs(self)
364 file_list = [f
for f
in self.get_all_input_file_names()]
365 print(
"Merge the following files:")
367 cmd = [
"b2file-merge",
"-f"]
368 args = cmd + [self.get_output_file_name(self.
output_file_name())] + file_list
369 subprocess.check_call(args)
375 print(
"Finished merging. Now remove the input files to save space.")
376 file_list = [f
for f
in self.get_all_input_file_names()]
377 for input_file
in file_list:
379 os.remove(input_file)
380 except FileNotFoundError:
385 Default function from base b2luigi.Task class.
387 self._remove_output()
392 Record the data for the three state filters for the ToPXDCKF.
394 This task requires that the events used for training have been simulated before, which is done using the
395 ``SplitMergeSimTask``.
398 experiment_number = b2luigi.IntParameter()
401 random_seed = b2luigi.Parameter()
403 n_events = b2luigi.IntParameter()
406 layer = b2luigi.IntParameter()
410 Generate list of output files that the task should produce.
411 The task is considered finished if and only if the outputs all exist.
413 for record_fname
in [
"records1.root",
"records2.root",
"records3.root"]:
414 yield self.add_to_output(record_fname)
418 This task only requires that the input files have been created.
429 Create a path for the recording. To record the data for the PXD state filters, CDC+SVD tracks are required, and these
430 must be truth matched before. The data have to recorded for each layer of the PXD, i.e. layers 1 and 2, but also an
433 :param layer: The layer for which the data are recorded.
434 :param records1_fname: Name of the records1 file.
435 :param records2_fname: Name of the records2 file.
436 :param records3_fname: Name of the records3 file.
438 path = basf2.create_path()
441 file_list = [fname
for fname
in self.get_all_input_file_names()
442 if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
443 path.add_module(
"RootInput", inputFileNames=file_list)
445 path.add_module(
"Gearbox")
446 path.add_module(
"Geometry")
447 path.add_module(
"SetupGenfitExtrapolation")
449 add_hit_preparation_modules(path, components=[
"SVD",
"PXD"])
451 add_track_finding(path, reco_tracks=
"CDCSVDRecoTracks", components=[
"CDC",
"SVD"], prune_temporary_tracks=
False)
453 path.add_module(
'TrackFinderMCTruthRecoTracks',
454 RecoTracksStoreArrayName=
"MCRecoTracks",
460 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
True, UseCDCHits=
True,
461 mcRecoTracksStoreArrayName=
"MCRecoTracks",
462 prRecoTracksStoreArrayName=
"CDCSVDRecoTracks")
463 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCSVDRecoTracks")
465 path.add_module(
"ToPXDCKF",
466 inputRecoTrackStoreArrayName=
"CDCSVDRecoTracks",
467 outputRecoTrackStoreArrayName=
"RecoTracks",
468 outputRelationRecoTrackStoreArrayName=
"CDCSVDRecoTracks",
469 hitFilter=
"angulardistance",
470 seedFilter=
"angulardistance",
474 relationCheckForDirection=
"backward",
476 writeOutDirection=
"backward",
478 firstHighFilter=
"truth",
479 firstEqualFilter=
"recording",
480 firstEqualFilterParameters={
"treeName":
"records1",
"rootFileName": records1_fname,
"returnWeight": 1.0},
481 firstLowFilter=
"none",
482 firstHighUseNStates=0,
483 firstToggleOnLayer=layer,
485 advanceHighFilter=
"advance",
487 secondHighFilter=
"truth",
488 secondEqualFilter=
"recording",
489 secondEqualFilterParameters={
"treeName":
"records2",
"rootFileName": records2_fname,
"returnWeight": 1.0},
490 secondLowFilter=
"none",
491 secondHighUseNStates=0,
492 secondToggleOnLayer=layer,
494 updateHighFilter=
"fit",
496 thirdHighFilter=
"truth",
497 thirdEqualFilter=
"recording",
498 thirdEqualFilterParameters={
"treeName":
"records3",
"rootFileName": records3_fname},
499 thirdLowFilter=
"none",
500 thirdHighUseNStates=0,
501 thirdToggleOnLayer=layer,
506 enableOverlapResolving=
False)
512 Create basf2 path to process with event generation and simulation.
516 records1_fname=self.get_output_file_name(
"records1.root"),
517 records2_fname=self.get_output_file_name(
"records2.root"),
518 records3_fname=self.get_output_file_name(
"records3.root"),
523 Default function from base b2luigi.Task class.
525 self._remove_output()
530 A teacher task runs the basf2 mva teacher on the training data provided by a
531 data collection task.
533 In this task the three state filters are trained, each with the corresponding recordings from the different layers.
534 It will be executed for each FastBDT option defined in the SummaryTask.
538 experiment_number = b2luigi.IntParameter()
541 random_seed = b2luigi.Parameter()
543 n_events = b2luigi.IntParameter()
545 fast_bdt_option_state_filter = b2luigi.ListParameter(
547 hashed=
True, default=[50, 8, 3, 0.1]
551 filter_number = b2luigi.IntParameter()
553 training_target = b2luigi.Parameter(
560 exclude_variables = b2luigi.ListParameter(
562 hashed=
True, default=[]
568 Name of weightfile that is created by the teacher task.
570 :param fast_bdt_option: FastBDT option that is used to train this MVA
571 :param filter_number: Filter number (first=1, second=2, third=3) to be trained
573 if fast_bdt_option
is None:
575 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
577 if filter_number
is None:
579 weightfile_name = f
"trk_ToPXDStateFilter_{filter_number}" + fast_bdt_string
580 return weightfile_name
584 This task requires that the recordings for the state filters.
586 for layer
in [1, 2, 3]:
591 random_seed=
"training",
597 Generate list of output files that the task should produce.
598 The task is considered finished if and only if the outputs all exist.
604 Use basf2_mva teacher to create MVA weightfile from collected training
607 This is the main process that is dispatched by the ``run`` method that
608 is inherited from ``Basf2Task``.
610 records_files = self.get_input_file_names(f
"records{self.filter_number}.root")
612 tree_name = f
"records{self.filter_number}"
613 print(f
"Processed records files: {records_files},\nfeature tree name: {tree_name}")
615 my_basf2_mva_teacher(
616 records_files=records_files,
618 weightfile_identifier=weightfile_identifier,
623 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier +
'.root'))
627 Default function from base b2luigi.Task class.
629 self._remove_output()
634 Task to record data for the final result filter. This requires trained state filters.
635 The cuts on the state filter classifiers are set to rather low values to ensure that all signal is contained in the recorded
636 file. Also, the values for XXXXXHighUseNStates are chosen conservatively, i.e. rather on the high side.
640 experiment_number = b2luigi.IntParameter()
643 random_seed = b2luigi.Parameter()
645 n_events_training = b2luigi.IntParameter()
647 fast_bdt_option_state_filter = b2luigi.ListParameter(
649 hashed=
True, default=[200, 8, 3, 0.1]
653 result_filter_records_name = b2luigi.Parameter()
656 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
660 Generate list of output files that the task should produce.
661 The task is considered finished if and only if the outputs all exist.
667 This task requires that the training SplitMergeSimTask is finished, as well as that the state filters are trained using
668 the CKFStateFilterTeacherTask..
676 filter_numbers = [1, 2, 3]
677 for filter_number
in filter_numbers:
679 CKFStateFilterTeacherTask,
683 filter_number=filter_number,
689 Create a path for the recording of the result filter. This file is then used to train the result filter.
691 :param result_filter_records_name: Name of the recording file.
694 path = basf2.create_path()
697 file_list = [fname
for fname
in self.get_all_input_file_names()
698 if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
699 path.add_module(
"RootInput", inputFileNames=file_list)
701 path.add_module(
"Gearbox")
702 path.add_module(
"Geometry")
703 path.add_module(
"SetupGenfitExtrapolation")
705 add_hit_preparation_modules(path, components=[
"SVD",
"PXD"])
707 add_track_finding(path, reco_tracks=
"CDCSVDRecoTracks", components=[
"CDC",
"SVD"], prune_temporary_tracks=
False)
709 path.add_module(
'TrackFinderMCTruthRecoTracks',
710 RecoTracksStoreArrayName=
"MCRecoTracks",
716 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
True, UseCDCHits=
True,
717 mcRecoTracksStoreArrayName=
"MCRecoTracks",
718 prRecoTracksStoreArrayName=
"CDCSVDRecoTracks")
719 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCSVDRecoTracks")
726 f
"trk_ToPXDStateFilter_1_Parameter{fast_bdt_string}",
728 f
"trk_ToPXDStateFilter_1{fast_bdt_string}",
732 f
"trk_ToPXDStateFilter_2_Parameter{fast_bdt_string}",
734 f
"trk_ToPXDStateFilter_2{fast_bdt_string}",
738 f
"trk_ToPXDStateFilter_3_Parameter{fast_bdt_string}",
740 f
"trk_ToPXDStateFilter_3{fast_bdt_string}",
743 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
744 first_high_filter_parameters = {
"DBPayloadName": f
"trk_ToPXDStateFilter_1_Parameter{fast_bdt_string}",
745 "direction":
"backward"}
746 second_high_filter_parameters = {
"DBPayloadName": f
"trk_ToPXDStateFilter_2_Parameter{fast_bdt_string}"}
747 third_high_filter_parameters = {
"DBPayloadName": f
"trk_ToPXDStateFilter_3_Parameter{fast_bdt_string}"}
749 path.add_module(
"ToPXDCKF",
750 inputRecoTrackStoreArrayName=
"CDCSVDRecoTracks",
751 outputRecoTrackStoreArrayName=
"RecoTracks",
752 outputRelationRecoTrackStoreArrayName=
"CDCSVDRecoTracks",
754 relationCheckForDirection=
"backward",
756 writeOutDirection=
"backward",
758 firstHighFilter=
"mva",
759 firstHighFilterParameters=first_high_filter_parameters,
760 firstHighUseNStates=10,
762 advanceHighFilter=
"advance",
764 secondHighFilter=
"mva",
765 secondHighFilterParameters=second_high_filter_parameters,
766 secondHighUseNStates=10,
768 updateHighFilter=
"fit",
770 thirdHighFilter=
"mva",
771 thirdHighFilterParameters=third_high_filter_parameters,
772 thirdHighUseNStates=10,
775 filterParameters={
"rootFileName": result_filter_records_name},
778 enableOverlapResolving=
True)
784 Create basf2 path to process with event generation and simulation.
792 Default function from base b2luigi.Task class.
794 self._remove_output()
799 A teacher task runs the basf2 mva teacher on the training data for the result filter.
803 experiment_number = b2luigi.IntParameter()
806 random_seed = b2luigi.Parameter()
808 n_events = b2luigi.IntParameter()
810 fast_bdt_option_state_filter = b2luigi.ListParameter(
812 hashed=
True, default=[50, 8, 3, 0.1]
816 fast_bdt_option_result_filter = b2luigi.ListParameter(
818 hashed=
True, default=[200, 8, 3, 0.1]
822 result_filter_records_name = b2luigi.Parameter()
824 training_target = b2luigi.Parameter(
831 exclude_variables = b2luigi.ListParameter(
833 hashed=
True, default=[]
839 Name of weightfile that is created by the teacher task.
841 :param fast_bdt_option: FastBDT option that is used to train this MVA
843 if fast_bdt_option
is None:
845 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
846 weightfile_name =
"trk_ToPXDResultFilter" + fast_bdt_string
847 return weightfile_name
851 Generate list of luigi Tasks that this Task depends on.
863 Generate list of output files that the task should produce.
864 The task is considered finished if and only if the outputs all exist.
870 Use basf2_mva teacher to create MVA weightfile from collected training
873 This is the main process that is dispatched by the ``run`` method that
874 is inherited from ``Basf2Task``.
877 tree_name =
"records"
878 print(f
"Processed records files for result filter training: {records_files},\nfeature tree name: {tree_name}")
880 my_basf2_mva_teacher(
881 records_files=records_files,
888 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier +
".root"))
892 Default function from base b2luigi.Task class.
894 self._remove_output()
899 Validate the performance of the trained filters by trying various combinations of FastBDT options, as well as cut values for
900 the states, the number of best candidates kept after each filter, and similar for the result filter.
903 experiment_number = b2luigi.IntParameter()
905 n_events_training = b2luigi.IntParameter()
907 fast_bdt_option_state_filter = b2luigi.ListParameter(
909 hashed=
True, default=[200, 8, 3, 0.1]
913 fast_bdt_option_result_filter = b2luigi.ListParameter(
915 hashed=
True, default=[200, 8, 3, 0.1]
919 n_events_testing = b2luigi.IntParameter()
921 state_filter_cut = b2luigi.FloatParameter()
923 use_n_best_states = b2luigi.IntParameter()
925 result_filter_cut = b2luigi.FloatParameter()
927 use_n_best_results = b2luigi.IntParameter()
930 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
934 Generate list of output files that the task should produce.
935 The task is considered finished if and only if the outputs all exist.
939 yield self.add_to_output(
940 f
"to_pxd_ckf_validation{fbdt_state_filter_string}{fbdt_result_filter_string}.root")
944 This task requires trained result filters, trained state filters, and that an independent data set for validation was
945 created using the SplitMergeSimTask with the random seed optimisation.
949 result_filter_records_name=f
"filter_records{fbdt_state_filter_string}.root",
954 random_seed=
'training'
960 random_seed=
"optimisation",
962 filter_numbers = [1, 2, 3]
963 for filter_number
in filter_numbers:
965 CKFStateFilterTeacherTask,
968 random_seed=
"training",
969 filter_number=filter_number,
975 Create a path to validate the trained filters.
977 path = basf2.create_path()
980 file_list = [fname
for fname
in self.get_all_input_file_names()
981 if "generated_mc_N" in fname
and "optimisation" in fname
and fname.endswith(
".root")]
982 path.add_module(
"RootInput", inputFileNames=file_list)
984 path.add_module(
"Gearbox")
985 path.add_module(
"Geometry")
986 path.add_module(
"SetupGenfitExtrapolation")
988 add_hit_preparation_modules(path, components=[
"SVD",
"PXD"])
990 add_track_finding(path, reco_tracks=
"CDCSVDRecoTracks", components=[
"CDC",
"SVD"], prune_temporary_tracks=
False)
992 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCSVDRecoTracks")
1000 f
"trk_ToPXDStateFilter_1_Parameter{fbdt_state_filter_string}",
1002 f
"trk_ToPXDStateFilter_1{fbdt_state_filter_string}",
1006 f
"trk_ToPXDStateFilter_2_Parameter{fbdt_state_filter_string}",
1008 f
"trk_ToPXDStateFilter_2{fbdt_state_filter_string}",
1012 f
"trk_ToPXDStateFilter_3_Parameter{fbdt_state_filter_string}",
1014 f
"trk_ToPXDStateFilter_3{fbdt_state_filter_string}",
1018 f
"trk_ToPXDResultFilter_Parameter{fbdt_result_filter_string}",
1020 f
"trk_ToPXDResultFilter{fbdt_result_filter_string}",
1023 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
1024 first_high_filter_parameters = {
"DBPayloadName": f
"trk_ToPXDStateFilter_1_Parameter{fbdt_state_filter_string}",
1025 "direction":
"backward"}
1026 second_high_filter_parameters = {
"DBPayloadName": f
"trk_ToPXDStateFilter_2_Parameter{fbdt_state_filter_string}"}
1027 third_high_filter_parameters = {
"DBPayloadName": f
"trk_ToPXDStateFilter_3_Parameter{fbdt_state_filter_string}"}
1028 filter_parameters = {
"DBPayloadName": f
"trk_ToPXDResultFilter_Parameter{fbdt_result_filter_string}"}
1030 path.add_module(
"ToPXDCKF",
1031 inputRecoTrackStoreArrayName=
"CDCSVDRecoTracks",
1032 outputRecoTrackStoreArrayName=
"PXDRecoTracks",
1033 outputRelationRecoTrackStoreArrayName=
"CDCSVDRecoTracks",
1035 relationCheckForDirection=
"backward",
1037 writeOutDirection=
"backward",
1039 firstHighFilter=
"mva_with_direction_check",
1040 firstHighFilterParameters=first_high_filter_parameters,
1043 advanceHighFilter=
"advance",
1044 advanceHighFilterParameters={
"direction":
"backward"},
1046 secondHighFilter=
"mva",
1047 secondHighFilterParameters=second_high_filter_parameters,
1050 updateHighFilter=
"fit",
1052 thirdHighFilter=
"mva",
1053 thirdHighFilterParameters=third_high_filter_parameters,
1057 filterParameters=filter_parameters,
1061 enableOverlapResolving=
True)
1063 path.add_module(
'RelatedTracksCombiner',
1064 VXDRecoTracksStoreArrayName=
"PXDRecoTracks",
1065 CDCRecoTracksStoreArrayName=
"CDCSVDRecoTracks",
1066 recoTracksStoreArrayName=
"RecoTracks")
1068 path.add_module(
'TrackFinderMCTruthRecoTracks',
1069 RecoTracksStoreArrayName=
"MCRecoTracks",
1075 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
True, UseSVDHits=
True, UseCDCHits=
True,
1076 mcRecoTracksStoreArrayName=
"MCRecoTracks",
1077 prRecoTracksStoreArrayName=
"RecoTracks")
1081 output_file_name=self.get_output_file_name(
1082 f
"to_pxd_ckf_validation{fbdt_state_filter_string}{fbdt_result_filter_string}.root"),
1083 reco_tracks_name=
"RecoTracks",
1084 mc_reco_tracks_name=
"MCRecoTracks",
1093 Create basf2 path to process with event generation and simulation.
1099 Default function from base b2luigi.Task class.
1101 self._remove_output()
1106 Task that collects and summarizes the main figure-of-merits from all the
1107 (validation and optimisation) child taks.
1110 n_events_training = b2luigi.get_setting(
1112 "n_events_training", default=1000
1116 n_events_testing = b2luigi.get_setting(
1118 "n_events_testing", default=500
1122 n_events_per_task = b2luigi.get_setting(
1124 "n_events_per_task", default=100
1128 num_processes = b2luigi.get_setting(
1130 "basf2_processes_per_worker", default=0
1135 bkgfiles_by_exp = b2luigi.get_setting(
"bkgfiles_by_exp")
1137 bkgfiles_by_exp = {int(key): val
for (key, val)
in bkgfiles_by_exp.items()}
1140 batch_system =
'local'
1142 output_file_name =
'summary.json'
1152 Generate list of tasks that needs to be done for luigi to finish running
1156 fast_bdt_options = [
1162 experiment_numbers = b2luigi.get_setting(
"experiment_numbers")
1165 for experiment_number, fast_bdt_option_state_filter, fast_bdt_option_result_filter
in itertools.product(
1166 experiment_numbers, fast_bdt_options, fast_bdt_options
1169 state_filter_cuts = [0.01, 0.02, 0.03, 0.05, 0.1, 0.2]
1170 n_best_states_list = [3, 5, 10]
1171 result_filter_cuts = [0.05, 0.1, 0.2]
1172 n_best_results_list = [2, 3, 5]
1173 for state_filter_cut, n_best_states, result_filter_cut, n_best_results
in \
1174 itertools.product(state_filter_cuts, n_best_states_list, result_filter_cuts, n_best_results_list):
1176 ValidationAndOptimisationTask,
1177 experiment_number=experiment_number,
1180 state_filter_cut=state_filter_cut,
1181 use_n_best_states=n_best_states,
1182 result_filter_cut=result_filter_cut,
1183 use_n_best_results=n_best_results,
1184 fast_bdt_option_state_filter=fast_bdt_option_state_filter,
1185 fast_bdt_option_result_filter=fast_bdt_option_result_filter,
1196 'MCSideTrackingValidationModule_overview_figures_of_merit',
1197 'PRSideTrackingValidationModule_overview_figures_of_merit',
1198 'PRSideTrackingValidationModule_subdetector_figures_of_merit'
1203 all_files = self.get_all_input_file_names()
1204 for idx, single_file
in enumerate(all_files):
1205 with ROOT.TFile.Open(single_file,
'READ')
as f:
1207 for ntuple_name
in ntuple_names:
1208 ntuple = f.Get(ntuple_name)
1209 for i
in range(min(1, ntuple.GetEntries())):
1211 for branch
in ntuple.GetListOfBranches():
1212 name = branch.GetName()
1213 value = getattr(ntuple, name)
1214 branch_data[name] = value
1215 branch_data[
'file_path'] = single_file
1216 output_dict[f
'{idx}'] = branch_data
1220 json.dump(output_dict, f, indent=4)
1224 Default function from base b2luigi.Task class.
1226 self._remove_output()
1229if __name__ ==
"__main__":
1231 b2luigi.set_setting(
"env_script",
"./setup_basf2.sh")
1232 b2luigi.set_setting(
"scratch_dir", tempfile.gettempdir())
1233 workers = b2luigi.get_setting(
"workers", default=1)
1234 b2luigi.process(
SummaryTask(), workers=workers, batch=
True)
get_background_files(folder=None, output_file_info=True)
get_weightfile_identifier(self, fast_bdt_option=None)
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the input file name.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
random_seed
Random basf2 seed.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
fast_bdt_option_result_filter
Hyperparameter option of the FastBDT algorithm.
experiment_number
Experiment number of the conditions database, e.g.
filter_number
Number of the filter for which the records files are to be processed.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
get_weightfile_identifier(self, fast_bdt_option=None, filter_number=None)
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
experiment_number
Experiment number of the conditions database, e.g.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
random_seed
Random basf2 seed.
__init__(self, *args, **kwargs)
job_name
set the job name (inherited variable)
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the records file for training the final result filter.
n_events_training
Number of events to generate.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
create_result_recording_path(self, result_filter_records_name)
random_seed
Random basf2 seed.
experiment_number
Experiment number of the conditions database, e.g.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
random_seed
Random basf2 seed.
experiment_number
Experiment number of the conditions database, e.g.
create_state_recording_path(self, layer, records1_fname, records2_fname, records3_fname)
n_events
Number of events to generate.
layer
Layer on which to toggle for recording the information for training.
random_seed
Random basf2 seed.
str output_file_name
Output file name.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
experiment_number
Experiment number of the conditions database, e.g.
create_optimisation_and_validation_path(self)
use_n_best_results
How many results should be kept at maximum to search for overlaps.
state_filter_cut
Value of the cut on the MVA classifier output for accepting a state during CKF tracking.
result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
use_n_best_states
How many states should be kept at maximum in the combinatorial part of the CKF tree search.
n_events_training
Number of events to generate for the training data set.
fast_bdt_option_state_filter
FastBDT option to use to train the StateFilters.
n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
fast_bdt_option_result_filter
FastBDT option to use to train the Result Filter.
add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False, save_all_charged_particles_in_mc=False)