10combined_cdc_to_svd_ckf_mva_training
11-----------------------------------------
16This python script is used for the training and validation of the classifiers of
17the three MVA-based state filters and one result filter of the CDCToSVDSpacePointCKF.
18This CKF extraplates tracks found in the CDC into the SVD and adds SVD hits using a
19combinatorial tree search and a Kalman filter based track fit in each step.
21To avoid mistakes, b2luigi is used to create a task chain for a combined training and
22validation of all classifiers.
24The order of the b2luigi tasks in this script is as follows (top to bottom):
25* Two tasks to create input samples for training and testing (``GenerateSimTask`` and
26``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
27generated and a number of events per task to reduce runtime. It then divides the total
28number of events by the number of events per task and creates as ``GenerateSimTask`` as
29needed, each with a specific random seed, so that in the end the total number of
30training and testing events are simulated. The individual files are then combined
31by the SplitNMergeSimTask into one file each for training and testing.
32* The ``StateRecordingTask`` writes out the data required for training the state
34* The ``CKFStateFilterTeacherTask`` trains the state filter MVAs, using FastBDT by
35default, with a given set of options.
36* The ``ResultRecordingTask`` writes out the data used for the training of the result
37filter MVA. This task requires that the state filters have been trained before.
38* The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
39given set of FastBDT options. This requires that the result filter records have
40been created with the ``ResultRecordingTask``.
41* The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
42provided to run the tracking chain with the weight file under test, and also
43runs the tracking validation.
44* Finally, the ``SummaryTask`` is the "brain" of the script. It invokes the
45``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
46and cut values on the MVA classifier output.
48Due to the dependencies, the calls of the task are reversed. The SummaryTask
49calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
50values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
51training, and simulation tasks.
53Each combination of FastBDT options and state filter cut values and candidate selection
54is used to train the result filter, which includes that the ``ResultRecordingTask``
55is executed multiple times with different combinations of FastBDT options and cut value
56and candidate selection.
58b2luigi: Understanding the steering file
59~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61All trainings and validations are done in the correct order in this steering
62file. For the purpose of creating a dependency graph, the `b2luigi
63<https://b2luigi.readthedocs.io>`_ python package is used, which extends the
64`luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
66Each task that has to be done is represented by a special class, which defines
67which defines parameters, output files and which other tasks with which
68parameters it depends on. For example a teacher task, which runs
69``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
70task which runs a reconstruction and writes out track-wise variables into a root
71file for training. An evaluation/validation task for testing the classifier
72requires both the teacher task, as it needs the weightfile to be present, and
73also a data collection task, because it needs a dataset for testing classifier.
75The final task that defines which tasks need to be done for the steering file to
76finish is the ``SummaryTask``. When you only want to run parts of the
77training/validation pipeline, you can comment out requirements in the Master
78task or replace them by lower-level tasks during debugging.
83This steering file relies on b2luigi_ for task scheduling. It can be installed
86 python3 -m pip install [--user] b2luigi
88Use the ``--user`` option if you have not rights to install python packages into
89your externals (e.g. because you are using cvmfs) and install them in
90``$HOME/.local`` instead.
95Instead of command line arguments, the b2luigi script is configured via a
96``settings.json`` file. Open it in your favorite text editor and modify it to
97fit to your requirements.
102You can test the b2luigi without running it via::
104 python3 combined_cdc_to_svd_ckf_mva_training.py --dry-run
105 python3 combined_cdc_to_svd_ckf_mva_training.py --show-output
107This will show the outputs and show potential errors in the definitions of the
108luigi task dependencies. To run the the steering file in normal (local) mode,
111 python3 combined_cdc_to_svd_ckf_mva_training.py
113One can use the interactive luigi web interface via the central scheduler
114which visualizes the task graph while it is running. Therefore, the scheduler
115daemon ``luigid`` has to run in the background, which is located in
116``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
121Then, execute your steering (e.g. in another terminal) with::
123 python3 combined_cdc_to_svd_ckf_mva_training.py --scheduler-port 8886
125To view the web interface, open your webbrowser enter into the url bar::
129If you don't run the steering file on the same machine on which you run your web
130browser, you have two options:
132 1. Run both the steering file and ``luigid`` remotely and use
133 ssh-port-forwarding to your local host. Therefore, run on your local
136 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
138 2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
139 local host>`` argument when calling the steering file
141Accessing the results / output files
142~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144All output files are stored in a directory structure in the ``result_path`` set in
145``settings.json``. The directory tree encodes the used b2luigi parameters. This
146ensures reproducibility and makes parameter searches easy. Sometimes, it is hard to
147find the relevant output files. You can view the whole directory structure by
148running ``tree <result_path>``. Ise the unix ``find`` command to find the files
149that interest you, e.g.::
151 find <result_path> -name "*.root" # find all ROOT files
162from tracking
import add_track_finding
168from ckf_training
import my_basf2_mva_teacher, create_fbdt_option_string
169from tracking_mva_filter_payloads.write_tracking_mva_filter_payloads_to_db
import write_tracking_mva_filter_payloads_to_db
172install_helpstring_formatter = (
"\nCould not find {module} python module.Try installing it via\n"
173 " python3 -m pip install [--user] {module}\n")
176 from b2luigi.core.utils
import create_output_dirs
177 from b2luigi.basf2_helper
import Basf2PathTask, Basf2Task
178except ModuleNotFoundError:
179 print(install_helpstring_formatter.format(module=
"b2luigi"))
185 Simple task that defines the configuration of the LSF batch submission.
202 Same as LSFTask, but for memory-intensive tasks.
211 Generate simulated Monte Carlo with background overlay.
213 Make sure to use different ``random_seed`` parameters for the training data
214 format the classifier trainings and for the test data for the respective
215 evaluation/validation tasks.
219 experiment_number = b2luigi.IntParameter()
222 random_seed = b2luigi.Parameter()
224 n_events = b2luigi.IntParameter()
226 bkgfiles_dir = b2luigi.Parameter(
235 Create output file name depending on number of events and production
236 mode that is specified in the random_seed string.
238 :param n_events: Number of events to simulate.
239 :param random_seed: Random seed to use for the simulation to create independent samples.
243 if random_seed
is None:
245 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
249 Generate list of output files that the task should produce.
250 The task is considered finished if and only if the outputs all exist.
256 Create basf2 path to process with event generation and simulation.
259 path = basf2.create_path()
263 path.add_module(
"EvtGenInput")
282 Default function from base b2luigi.Task class.
284 self._remove_output()
291 Generate simulated Monte Carlo with background overlay.
293 Make sure to use different ``random_seed`` parameters for the training data
294 format the classifier trainings and for the test data for the respective
295 evaluation/validation tasks.
298 experiment_number = b2luigi.IntParameter()
301 random_seed = b2luigi.Parameter()
303 n_events = b2luigi.IntParameter()
305 bkgfiles_dir = b2luigi.Parameter(
314 Create output file name depending on number of events and production
315 mode that is specified in the random_seed string.
317 :param n_events: Number of events to simulate.
318 :param random_seed: Random seed to use for the simulation to create independent samples.
322 if random_seed
is None:
324 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
328 Generate list of output files that the task should produce.
329 The task is considered finished if and only if the outputs all exist.
335 This task requires several GenerateSimTask to be finished so that he required number of events is created.
337 n_events_per_task = SummaryTask.n_events_per_task
338 quotient, remainder = divmod(self.
n_events, n_events_per_task)
339 for i
in range(quotient):
342 num_processes=SummaryTask.num_processes,
343 random_seed=self.
random_seed +
'_' + str(i).zfill(3),
344 n_events=n_events_per_task,
350 num_processes=SummaryTask.num_processes,
351 random_seed=self.
random_seed +
'_' + str(quotient).zfill(3),
356 @b2luigi.on_temporary_files
359 When all GenerateSimTasks finished, merge the output.
361 create_output_dirs(self)
363 file_list = [f
for f
in self.get_all_input_file_names()]
364 print(
"Merge the following files:")
366 cmd = [
"b2file-merge",
"-f"]
367 args = cmd + [self.get_output_file_name(self.
output_file_name())] + file_list
368 subprocess.check_call(args)
374 print(
"Finished merging. Now remove the input files to save space.")
375 file_list = [f
for f
in self.get_all_input_file_names()]
376 for input_file
in file_list:
378 os.remove(input_file)
379 except FileNotFoundError:
384 Default function from base b2luigi.Task class.
386 self._remove_output()
391 Record the data for the three state filters for the CDCToSVDSpacePointCKF.
393 This task requires that the events used for training have been simulated before, which is done using the
394 ``SplitMergeSimTask``.
397 experiment_number = b2luigi.IntParameter()
400 random_seed = b2luigi.Parameter()
402 n_events = b2luigi.IntParameter()
405 layer = b2luigi.IntParameter()
409 Generate list of output files that the task should produce.
410 The task is considered finished if and only if the outputs all exist.
412 for record_fname
in [
"records1.root",
"records2.root",
"records3.root"]:
413 yield self.add_to_output(record_fname)
417 This task only requires that the input files have been created.
428 Create a path for the recording. To record the data for the SVD state filters, CDC tracks are required, and these must
429 be truth matched before. The data have to recorded for each layer of the SVD, i.e. layers 3 to 6, but also an artificial
432 :param layer: The layer for which the data are recorded.
433 :param records1_fname: Name of the records1 file.
434 :param records2_fname: Name of the records2 file.
435 :param records3_fname: Name of the records3 file.
437 path = basf2.create_path()
440 file_list = [fname
for fname
in self.get_all_input_file_names()
441 if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
442 path.add_module(
"RootInput", inputFileNames=file_list)
444 path.add_module(
"Gearbox")
445 path.add_module(
"Geometry")
446 path.add_module(
"SetupGenfitExtrapolation")
448 add_hit_preparation_modules(path, components=[
"SVD"])
450 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
452 path.add_module(
'TrackFinderMCTruthRecoTracks',
453 RecoTracksStoreArrayName=
"MCRecoTracks",
459 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
False, UseCDCHits=
True,
460 mcRecoTracksStoreArrayName=
"MCRecoTracks",
461 prRecoTracksStoreArrayName=
"CDCRecoTracks")
462 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCRecoTracks")
464 path.add_module(
"CDCToSVDSpacePointCKF",
465 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
466 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
467 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
469 relationCheckForDirection=
"backward",
471 writeOutDirection=
"backward",
473 firstHighFilter=
"truth",
474 firstEqualFilter=
"recording",
475 firstEqualFilterParameters={
"treeName":
"records1",
"rootFileName":
476 records1_fname,
"returnWeight": 1.0},
477 firstLowFilter=
"none",
478 firstHighUseNStates=0,
479 firstToggleOnLayer=layer,
481 advanceHighFilter=
"advance",
483 secondHighFilter=
"truth",
484 secondEqualFilter=
"recording",
485 secondEqualFilterParameters={
"treeName":
"records2",
"rootFileName":
486 records2_fname,
"returnWeight": 1.0},
487 secondLowFilter=
"none",
488 secondHighUseNStates=0,
489 secondToggleOnLayer=layer,
491 updateHighFilter=
"fit",
493 thirdHighFilter=
"truth",
494 thirdEqualFilter=
"recording",
495 thirdEqualFilterParameters={
"treeName":
"records3",
"rootFileName": records3_fname},
496 thirdLowFilter=
"none",
497 thirdHighUseNStates=0,
498 thirdToggleOnLayer=layer,
503 enableOverlapResolving=
False)
509 Create basf2 path to process with event generation and simulation.
513 records1_fname=self.get_output_file_name(
"records1.root"),
514 records2_fname=self.get_output_file_name(
"records2.root"),
515 records3_fname=self.get_output_file_name(
"records3.root"),
520 Default function from base b2luigi.Task class.
522 self._remove_output()
527 A teacher task runs the basf2 mva teacher on the training data provided by a
528 data collection task.
530 In this task the three state filters are trained, each with the corresponding recordings from the different layers.
531 It will be executed for each FastBDT option defined in the SummaryTask.
534 experiment_number = b2luigi.IntParameter()
537 random_seed = b2luigi.Parameter()
539 n_events = b2luigi.IntParameter()
541 fast_bdt_option_state_filter = b2luigi.ListParameter(
543 hashed=
True, default=[50, 8, 3, 0.1]
547 filter_number = b2luigi.IntParameter()
549 training_target = b2luigi.Parameter(
556 exclude_variables = b2luigi.ListParameter(
559 hashed=
True, default=[
567 "seed_lowest_svd_layer",
568 "seed_lowest_cdc_layer",
569 "quality_index_triplet",
570 "quality_index_circle",
571 "quality_index_helix",
574 "mean_rest_cluster_charge",
575 "min_rest_cluster_charge",
576 "std_rest_cluster_charge",
577 "cluster_1_seed_charge",
578 "cluster_2_seed_charge",
579 "mean_rest_cluster_seed_charge",
580 "min_rest_cluster_seed_charge",
581 "std_rest_cluster_seed_charge",
584 "mean_rest_cluster_size",
585 "min_rest_cluster_size",
586 "std_rest_cluster_size",
589 "mean_rest_cluster_snr",
590 "min_rest_cluster_snr",
591 "std_rest_cluster_snr",
592 "cluster_1_charge_over_size",
593 "cluster_2_charge_over_size",
594 "mean_rest_cluster_charge_over_size",
595 "min_rest_cluster_charge_over_size",
596 "std_rest_cluster_charge_over_size",
603 Name of weightfile that is created by the teacher task.
605 :param fast_bdt_option: FastBDT option that is used to train this MVA
606 :param filter_number: Filter number (first=1, second=2, third=3) to be trained
609 if fast_bdt_option
is None:
611 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
612 if filter_number
is None:
614 weightfile_name = f
"trk_CDCToSVDSpacePointStateFilter_{filter_number}" + fast_bdt_string
615 return weightfile_name
619 This task requires that the recordings for the state filters.
621 for layer
in [3, 4, 5, 6, 7]:
626 random_seed=
"training",
632 Generate list of output files that the task should produce.
633 The task is considered finished if and only if the outputs all exist.
639 Use basf2_mva teacher to create MVA weightfile from collected training
642 This is the main process that is dispatched by the ``run`` method that
643 is inherited from ``Basf2Task``.
645 records_files = self.get_input_file_names(f
"records{self.filter_number}.root")
647 tree_name = f
"records{self.filter_number}"
648 print(f
"Processed records files: {records_files},\nfeature tree name: {tree_name}")
650 my_basf2_mva_teacher(
651 records_files=records_files,
653 weightfile_identifier=weightfile_identifier,
658 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier +
".root"))
662 Default function from base b2luigi.Task class.
664 self._remove_output()
669 Task to record data for the final result filter. This requires trained state filters.
670 The cuts on the state filter classifiers are set to rather low values to ensure that all signal is contained in the
671 recorded file. Also, the values for XXXXXHighUseNStates are chosen conservatively, i.e. rather on the high side.
675 experiment_number = b2luigi.IntParameter()
678 random_seed = b2luigi.Parameter()
680 n_events = b2luigi.IntParameter()
682 fast_bdt_option_state_filter = b2luigi.ListParameter(
684 hashed=
True, default=[50, 8, 3, 0.1]
688 result_filter_records_name = b2luigi.Parameter()
691 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
695 Generate list of output files that the task should produce.
696 The task is considered finished if and only if the outputs all exist.
702 This task requires that the training SplitMergeSimTask is finished, as well as that the state filters are trained
703 using the CKFStateFilterTeacherTask..
711 filter_numbers = [1, 2, 3]
712 for filter_number
in filter_numbers:
714 CKFStateFilterTeacherTask,
718 filter_number=filter_number,
724 Create a path for the recording of the result filter. This file is then used to train the result filter.
726 :param result_filter_records_name: Name of the recording file.
729 path = basf2.create_path()
732 file_list = [fname
for fname
in self.get_all_input_file_names()
733 if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
734 path.add_module(
"RootInput", inputFileNames=file_list)
736 path.add_module(
"Gearbox")
737 path.add_module(
"Geometry")
738 path.add_module(
"SetupGenfitExtrapolation")
740 add_hit_preparation_modules(path, components=[
"SVD"])
742 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
744 path.add_module(
'TrackFinderMCTruthRecoTracks',
745 RecoTracksStoreArrayName=
"MCRecoTracks",
751 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
False, UseCDCHits=
True,
752 mcRecoTracksStoreArrayName=
"MCRecoTracks",
753 prRecoTracksStoreArrayName=
"CDCRecoTracks")
754 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCRecoTracks")
760 f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fast_bdt_string}",
762 f
"trk_CDCToSVDSpacePointStateFilter_1{fast_bdt_string}",
766 f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fast_bdt_string}",
768 f
"trk_CDCToSVDSpacePointStateFilter_2{fast_bdt_string}",
772 f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fast_bdt_string}",
774 f
"trk_CDCToSVDSpacePointStateFilter_3{fast_bdt_string}",
777 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
778 first_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fast_bdt_string}",
779 "direction":
"backward"}
780 second_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fast_bdt_string}"}
781 third_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fast_bdt_string}"}
783 path.add_module(
"CDCToSVDSpacePointCKF",
784 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
785 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
786 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
788 relationCheckForDirection=
"backward",
790 writeOutDirection=
"backward",
792 firstHighFilter=
"mva_with_direction_check",
793 firstHighFilterParameters=first_high_filter_parameters,
794 firstHighUseNStates=10,
796 advanceHighFilter=
"advance",
797 advanceHighFilterParameters={
"direction":
"backward"},
799 secondHighFilter=
"mva",
800 secondHighFilterParameters=second_high_filter_parameters,
801 secondHighUseNStates=10,
803 updateHighFilter=
"fit",
805 thirdHighFilter=
"mva",
806 thirdHighFilterParameters=third_high_filter_parameters,
807 thirdHighUseNStates=10,
810 filterParameters={
"rootFileName": result_filter_records_name},
813 enableOverlapResolving=
True)
819 Create basf2 path to process with event generation and simulation.
827 Default function from base b2luigi.Task class.
829 self._remove_output()
834 A teacher task runs the basf2 mva teacher on the training data provided by a
835 data collection task.
837 Since teacher tasks are needed for all quality estimators covered by this
838 steering file and the only thing that changes is the required data
839 collection task and some training parameters, I decided to use inheritance
840 and have the basic functionality in this base class/interface and have the
841 specific teacher tasks inherit from it.
844 experiment_number = b2luigi.IntParameter()
847 random_seed = b2luigi.Parameter()
849 n_events = b2luigi.IntParameter()
851 fast_bdt_option_state_filter = b2luigi.ListParameter(
853 hashed=
True, default=[50, 8, 3, 0.1]
857 fast_bdt_option_result_filter = b2luigi.ListParameter(
859 hashed=
True, default=[200, 8, 3, 0.1]
863 result_filter_records_name = b2luigi.Parameter()
865 training_target = b2luigi.Parameter(
872 exclude_variables = b2luigi.ListParameter(
874 hashed=
True, default=[]
880 Name of weightfile that is created by the teacher task.
882 :param fast_bdt_option: FastBDT option that is used to train this MVA
884 if fast_bdt_option
is None:
886 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
887 weightfile_name =
"trk_CDCToSVDSpacePointResultFilter" + fast_bdt_string
888 return weightfile_name
892 Generate list of luigi Tasks that this Task depends on.
904 Generate list of output files that the task should produce.
905 The task is considered finished if and only if the outputs all exist.
911 Use basf2_mva teacher to create MVA weightfile from collected training
914 This is the main process that is dispatched by the ``run`` method that
915 is inherited from ``Basf2Task``.
918 tree_name =
"records"
919 print(f
"Processed records files for result filter training: {records_files},\nfeature tree name: {tree_name}")
921 my_basf2_mva_teacher(
922 records_files=records_files,
924 weightfile_identifier=weightfile_identifier,
930 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier +
".root"))
934 Default function from base b2luigi.Task class.
936 self._remove_output()
941 Validate the performance of the trained filters by trying various combinations of FastBDT options, as well as cut values
942 for the states, the number of best candidates kept after each filter, and similar for the result filter.
945 experiment_number = b2luigi.IntParameter()
947 n_events_training = b2luigi.IntParameter()
949 fast_bdt_option_state_filter = b2luigi.ListParameter(
951 hashed=
True, default=[50, 8, 3, 0.1]
955 fast_bdt_option_result_filter = b2luigi.ListParameter(
957 hashed=
True, default=[200, 8, 3, 0.1]
961 n_events_testing = b2luigi.IntParameter()
963 state_filter_cut = b2luigi.FloatParameter()
965 use_n_best_states = b2luigi.IntParameter()
967 result_filter_cut = b2luigi.FloatParameter()
969 use_n_best_results = b2luigi.IntParameter()
972 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
976 Generate list of output files that the task should produce.
977 The task is considered finished if and only if the outputs all exist.
981 yield self.add_to_output(
982 f
"cdc_to_svd_spacepoint_ckf_validation{fbdt_state_filter_string}{fbdt_result_filter_string}.root")
986 This task requires trained result filters, trained state filters, and that an independent data set for validation was
987 created using the SplitMergeSimTask with the random seed optimisation.
991 result_filter_records_name=f
"filter_records{fbdt_state_filter_string}.root",
996 random_seed=
'training'
1002 random_seed=
"optimisation",
1004 filter_numbers = [1, 2, 3]
1005 for filter_number
in filter_numbers:
1007 CKFStateFilterTeacherTask,
1009 random_seed=
"training",
1011 filter_number=filter_number,
1017 Create a path to validate the trained filters.
1019 path = basf2.create_path()
1022 file_list = [fname
for fname
in self.get_all_input_file_names()
1023 if "generated_mc_N" in fname
and "optimisation" in fname
and fname.endswith(
".root")]
1024 path.add_module(
"RootInput", inputFileNames=file_list)
1026 path.add_module(
"Gearbox")
1027 path.add_module(
"Geometry")
1028 path.add_module(
"SetupGenfitExtrapolation")
1030 add_hit_preparation_modules(path, components=[
"SVD"])
1032 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
1040 f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fbdt_state_filter_string}",
1042 f
"trk_CDCToSVDSpacePointStateFilter_1{fbdt_state_filter_string}",
1046 f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fbdt_state_filter_string}",
1048 f
"trk_CDCToSVDSpacePointStateFilter_2{fbdt_state_filter_string}",
1052 f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fbdt_state_filter_string}",
1054 f
"trk_CDCToSVDSpacePointStateFilter_3{fbdt_state_filter_string}",
1058 f
"trk_CDCToSVDSpacePointResultFilter_Parameter{fbdt_result_filter_string}",
1060 f
"trk_CDCToSVDSpacePointResultFilter{fbdt_result_filter_string}",
1063 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
1064 first_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fbdt_state_filter_string}",
1065 "direction":
"backward"}
1066 second_high_filter_parameters = {
1067 "DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fbdt_state_filter_string}"}
1068 third_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fbdt_state_filter_string}"}
1069 filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointResultFilter_Parameter{fbdt_result_filter_string}"}
1071 path.add_module(
"CDCToSVDSpacePointCKF",
1073 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
1074 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
1075 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
1077 relationCheckForDirection=
"backward",
1079 writeOutDirection=
"backward",
1081 firstHighFilter=
"mva_with_direction_check",
1082 firstHighFilterParameters=first_high_filter_parameters,
1085 advanceHighFilter=
"advance",
1086 advanceHighFilterParameters={
"direction":
"backward"},
1088 secondHighFilter=
"mva",
1089 secondHighFilterParameters=second_high_filter_parameters,
1092 updateHighFilter=
"fit",
1094 thirdHighFilter=
"mva",
1095 thirdHighFilterParameters=third_high_filter_parameters,
1099 filterParameters=filter_parameters,
1103 enableOverlapResolving=
True)
1105 path.add_module(
'RelatedTracksCombiner',
1106 VXDRecoTracksStoreArrayName=
"VXDRecoTracks",
1107 CDCRecoTracksStoreArrayName=
"CDCRecoTracks",
1108 recoTracksStoreArrayName=
"RecoTracks")
1110 path.add_module(
'TrackFinderMCTruthRecoTracks',
1111 RecoTracksStoreArrayName=
"MCRecoTracks",
1117 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
True, UseCDCHits=
True,
1118 mcRecoTracksStoreArrayName=
"MCRecoTracks",
1119 prRecoTracksStoreArrayName=
"RecoTracks")
1123 output_file_name=self.get_output_file_name(
1124 f
"cdc_to_svd_spacepoint_ckf_validation{fbdt_state_filter_string}{fbdt_result_filter_string}.root"),
1125 reco_tracks_name=
"RecoTracks",
1126 mc_reco_tracks_name=
"MCRecoTracks",
1135 Create basf2 path to process with event generation and simulation.
1141 Default function from base b2luigi.Task class.
1143 self._remove_output()
1148 Task that collects and summarizes the main figure-of-merits from all the
1149 (validation and optimisation) child taks.
1153 n_events_training = b2luigi.get_setting(
1155 "n_events_training", default=1000
1159 n_events_testing = b2luigi.get_setting(
1161 "n_events_testing", default=500
1165 n_events_per_task = b2luigi.get_setting(
1167 "n_events_per_task", default=100
1171 num_processes = b2luigi.get_setting(
1173 "basf2_processes_per_worker", default=0
1178 bkgfiles_by_exp = b2luigi.get_setting(
"bkgfiles_by_exp")
1180 bkgfiles_by_exp = {int(key): val
for (key, val)
in bkgfiles_by_exp.items()}
1183 batch_system =
'local'
1185 output_file_name =
'summary.json'
1195 Generate list of tasks that needs to be done for luigi to finish running
1199 fast_bdt_options = [
1205 experiment_numbers = b2luigi.get_setting(
"experiment_numbers")
1208 for experiment_number, fast_bdt_option_state_filter, fast_bdt_option_result_filter
in itertools.product(
1209 experiment_numbers, fast_bdt_options, fast_bdt_options
1212 state_filter_cuts = [0.01, 0.02, 0.03, 0.05, 0.1, 0.2]
1213 n_best_states_list = [3, 5, 10]
1214 result_filter_cuts = [0.05, 0.1, 0.2]
1215 n_best_results_list = [3, 5, 10]
1216 for state_filter_cut, n_best_states, result_filter_cut, n_best_results
in \
1217 itertools.product(state_filter_cuts, n_best_states_list, result_filter_cuts, n_best_results_list):
1219 ValidationAndOptimisationTask,
1220 experiment_number=experiment_number,
1223 state_filter_cut=state_filter_cut,
1224 use_n_best_states=n_best_states,
1225 result_filter_cut=result_filter_cut,
1226 use_n_best_results=n_best_results,
1227 fast_bdt_option_state_filter=fast_bdt_option_state_filter,
1228 fast_bdt_option_result_filter=fast_bdt_option_result_filter,
1239 'MCSideTrackingValidationModule_overview_figures_of_merit',
1240 'PRSideTrackingValidationModule_overview_figures_of_merit',
1241 'PRSideTrackingValidationModule_subdetector_figures_of_merit'
1246 all_files = self.get_all_input_file_names()
1247 for idx, single_file
in enumerate(all_files):
1248 with ROOT.TFile.Open(single_file,
'READ')
as f:
1250 for ntuple_name
in ntuple_names:
1251 ntuple = f.Get(ntuple_name)
1252 for i
in range(min(1, ntuple.GetEntries())):
1254 for branch
in ntuple.GetListOfBranches():
1255 name = branch.GetName()
1256 value = getattr(ntuple, name)
1257 branch_data[name] = value
1258 branch_data[
'file_path'] = single_file
1259 output_dict[f
'{idx}'] = branch_data
1263 json.dump(output_dict, f, indent=4)
1267 Default function from base b2luigi.Task class.
1269 self._remove_output()
1272if __name__ ==
"__main__":
1274 b2luigi.set_setting(
"env_script",
"./setup_basf2.sh")
1275 b2luigi.set_setting(
"scratch_dir", tempfile.gettempdir())
1276 workers = b2luigi.get_setting(
"workers", default=500)
1277 b2luigi.process(
SummaryTask(), workers=workers, batch=
True)
get_background_files(folder=None, output_file_info=True)
get_weightfile_identifier(self, fast_bdt_option=None)
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the input file name.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
random_seed
Random basf2 seed.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
fast_bdt_option_result_filter
Hyperparameter option of the FastBDT algorithm.
experiment_number
Experiment number of the conditions database, e.g.
filter_number
Number of the filter for which the records files are to be processed.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
get_weightfile_identifier(self, fast_bdt_option=None, filter_number=None)
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
experiment_number
Experiment number of the conditions database, e.g.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
random_seed
Random basf2 seed.
__init__(self, *args, **kwargs)
job_name
set the job name (inherited variable)
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the records file for training the final result filter.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
create_result_recording_path(self, result_filter_records_name)
random_seed
Random basf2 seed.
experiment_number
Experiment number of the conditions database, e.g.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
random_seed
Random basf2 seed.
experiment_number
Experiment number of the conditions database, e.g.
create_state_recording_path(self, layer, records1_fname, records2_fname, records3_fname)
n_events
Number of events to generate for training.
layer
Layer on which to toggle for recording the information for training.
random_seed
Random basf2 seed.
str output_file_name
Output file name.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
experiment_number
Experiment number of the conditions database, e.g.
create_optimisation_and_validation_path(self)
use_n_best_results
How many results should be kept at maximum to search for overlaps.
state_filter_cut
Value of the cut on the MVA classifier output for accepting a state during CKF tracking.
result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
use_n_best_states
How many states should be kept at maximum in the combinatorial part of the CKF tree search.
n_events_training
Number of events to generate for the training data set.
fast_bdt_option_state_filter
FastBDT option to use to train the StateFilters.
n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
fast_bdt_option_result_filter
FastBDT option to use to train the Result Filter.
add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False, save_all_charged_particles_in_mc=False)