10 combined_cdc_to_svd_ckf_mva_training
11 -----------------------------------------
13 Purpose of this script
14 ~~~~~~~~~~~~~~~~~~~~~~
16 This python script is used for the training and validation of the classifiers of
17 the three MVA-based state filters and one result filter of the CDCToSVDSpacePointCKF.
18 This CKF extraplates tracks found in the CDC into the SVD and adds SVD hits using a
19 combinatorial tree search and a Kalman filter based track fit in each step.
21 To avoid mistakes, b2luigi is used to create a task chain for a combined training and
22 validation of all classifiers.
24 The order of the b2luigi tasks in this script is as follows (top to bottom):
25 * Two tasks to create input samples for training and testing (``GenerateSimTask`` and
26 ``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
27 generated and a number of events per task to reduce runtime. It then divides the total
28 number of events by the number of events per task and creates as ``GenerateSimTask`` as
29 needed, each with a specific random seed, so that in the end the total number of
30 training and testing events are simulated. The individual files are then combined
31 by the SplitNMergeSimTask into one file each for training and testing.
32 * The ``StateRecordingTask`` writes out the data required for training the state
34 * The ``CKFStateFilterTeacherTask`` trains the state filter MVAs, using FastBDT by
35 default, with a given set of options.
36 * The ``ResultRecordingTask`` writes out the data used for the training of the result
37 filter MVA. This task requires that the state filters have been trained before.
38 * The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
39 given set of FastBDT options. This requires that the result filter records have
40 been created with the ``ResultRecordingTask``.
41 * The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
42 provided to run the tracking chain with the weight file under test, and also
43 runs the tracking validation.
44 * Finally, the ``MainTask`` is the "brain" of the script. It invokes the
45 ``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
46 and cut values on the MVA classifier output.
48 Due to the dependencies, the calls of the task are reversed. The MainTask
49 calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
50 values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
51 training, and simulation tasks.
53 Each combination of FastBDT options and state filter cut values and candidate selection
54 is used to train the result filter, which includes that the ``ResultRecordingTask``
55 is executed multiple times with different combinations of FastBDT options and cut value
56 and candidate selection.
58 b2luigi: Understanding the steering file
59 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61 All trainings and validations are done in the correct order in this steering
62 file. For the purpose of creating a dependency graph, the `b2luigi
63 <https://b2luigi.readthedocs.io>`_ python package is used, which extends the
64 `luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
66 Each task that has to be done is represented by a special class, which defines
67 which defines parameters, output files and which other tasks with which
68 parameters it depends on. For example a teacher task, which runs
69 ``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
70 task which runs a reconstruction and writes out track-wise variables into a root
71 file for training. An evaluation/validation task for testing the classifier
72 requires both the teacher task, as it needs the weightfile to be present, and
73 also a data collection task, because it needs a dataset for testing classifier.
75 The final task that defines which tasks need to be done for the steering file to
76 finish is the ``MainTask``. When you only want to run parts of the
77 training/validation pipeline, you can comment out requirements in the Master
78 task or replace them by lower-level tasks during debugging.
83 This steering file relies on b2luigi_ for task scheduling. It can be installed
86 python3 -m pip install [--user] b2luigi
88 Use the ``--user`` option if you have not rights to install python packages into
89 your externals (e.g. because you are using cvmfs) and install them in
90 ``$HOME/.local`` instead.
95 Instead of command line arguments, the b2luigi script is configured via a
96 ``settings.json`` file. Open it in your favorite text editor and modify it to
97 fit to your requirements.
102 You can test the b2luigi without running it via::
104 python3 combined_cdc_to_svd_ckf_mva_training.py --dry-run
105 python3 combined_cdc_to_svd_ckf_mva_training.py --show-output
107 This will show the outputs and show potential errors in the definitions of the
108 luigi task dependencies. To run the the steering file in normal (local) mode,
111 python3 combined_cdc_to_svd_ckf_mva_training.py
113 One can use the interactive luigi web interface via the central scheduler
114 which visualizes the task graph while it is running. Therefore, the scheduler
115 daemon ``luigid`` has to run in the background, which is located in
116 ``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
121 Then, execute your steering (e.g. in another terminal) with::
123 python3 combined_cdc_to_svd_ckf_mva_training.py --scheduler-port 8886
125 To view the web interface, open your webbrowser enter into the url bar::
129 If you don't run the steering file on the same machine on which you run your web
130 browser, you have two options:
132 1. Run both the steering file and ``luigid`` remotely and use
133 ssh-port-forwarding to your local host. Therefore, run on your local
136 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
138 2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
139 local host>`` argument when calling the steering file
141 Accessing the results / output files
142 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144 All output files are stored in a directory structure in the ``result_path`` set in
145 ``settings.json``. The directory tree encodes the used b2luigi parameters. This
146 ensures reproducibility and makes parameter searches easy. Sometimes, it is hard to
147 find the relevant output files. You can view the whole directory structure by
148 running ``tree <result_path>``. Ise the unix ``find`` command to find the files
149 that interest you, e.g.::
151 find <result_path> -name "*.root" # find all ROOT files
159 from tracking
import add_track_finding
165 from ckf_training
import my_basf2_mva_teacher, create_fbdt_option_string
168 install_helpstring_formatter = (
"\nCould not find {module} python module.Try installing it via\n"
169 " python3 -m pip install [--user] {module}\n")
172 from b2luigi.core.utils
import create_output_dirs
173 from b2luigi.basf2_helper
import Basf2PathTask, Basf2Task
174 except ModuleNotFoundError:
175 print(install_helpstring_formatter.format(module=
"b2luigi"))
181 Generate simulated Monte Carlo with background overlay.
183 Make sure to use different ``random_seed`` parameters for the training data
184 format the classifier trainings and for the test data for the respective
185 evaluation/validation tasks.
189 experiment_number = b2luigi.IntParameter()
192 random_seed = b2luigi.Parameter()
194 n_events = b2luigi.IntParameter()
196 bkgfiles_dir = b2luigi.Parameter(
207 Create output file name depending on number of events and production
208 mode that is specified in the random_seed string.
210 :param n_events: Number of events to simulate.
211 :param random_seed: Random seed to use for the simulation to create independent samples.
215 if random_seed
is None:
217 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
221 Generate list of output files that the task should produce.
222 The task is considered finished if and only if the outputs all exist.
228 Create basf2 path to process with event generation and simulation.
231 path = basf2.create_path()
235 path.add_module(
"EvtGenInput")
246 outputFileName=self.get_output_file_name(self.
output_file_nameoutput_file_name()),
255 Generate simulated Monte Carlo with background overlay.
257 Make sure to use different ``random_seed`` parameters for the training data
258 format the classifier trainings and for the test data for the respective
259 evaluation/validation tasks.
262 experiment_number = b2luigi.IntParameter()
265 random_seed = b2luigi.Parameter()
267 n_events = b2luigi.IntParameter()
269 bkgfiles_dir = b2luigi.Parameter(
280 Create output file name depending on number of events and production
281 mode that is specified in the random_seed string.
283 :param n_events: Number of events to simulate.
284 :param random_seed: Random seed to use for the simulation to create independent samples.
288 if random_seed
is None:
290 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
294 Generate list of output files that the task should produce.
295 The task is considered finished if and only if the outputs all exist.
301 This task requires several GenerateSimTask to be finished so that he required number of events is created.
303 n_events_per_task = MainTask.n_events_per_task
304 quotient, remainder = divmod(self.
n_eventsn_events, n_events_per_task)
305 for i
in range(quotient):
308 num_processes=MainTask.num_processes,
309 random_seed=self.
random_seedrandom_seed +
'_' + str(i).zfill(3),
310 n_events=n_events_per_task,
316 num_processes=MainTask.num_processes,
317 random_seed=self.
random_seedrandom_seed +
'_' + str(quotient).zfill(3),
322 @b2luigi.on_temporary_files
325 When all GenerateSimTasks finished, merge the output.
327 create_output_dirs(self)
329 file_list = [item
for sublist
in self.get_input_file_names().values()
for item
in sublist]
330 print(
"Merge the following files:")
332 cmd = [
"b2file-merge",
"-f"]
333 args = cmd + [self.get_output_file_name(self.
output_file_nameoutput_file_name())] + file_list
334 subprocess.check_call(args)
335 print(
"Finished merging. Now remove the input files to save space.")
336 for input_file
in file_list:
338 os.remove(input_file)
339 except FileNotFoundError:
345 Record the data for the three state filters for the CDCToSVDSpacePointCKF.
347 This task requires that the events used for training have been simulated before, which is done using the
348 ``SplitMergeSimTask``.
351 experiment_number = b2luigi.IntParameter()
354 random_seed = b2luigi.Parameter()
356 n_events = b2luigi.IntParameter()
359 layer = b2luigi.IntParameter()
363 Generate list of output files that the task should produce.
364 The task is considered finished if and only if the outputs all exist.
366 for record_fname
in [
"records1.root",
"records2.root",
"records3.root"]:
367 yield self.add_to_output(record_fname)
371 This task only requires that the input files have been created.
382 Create a path for the recording. To record the data for the SVD state filters, CDC tracks are required, and these must
383 be truth matched before. The data have to recorded for each layer of the SVD, i.e. layers 3 to 6, but also an artificial
386 :param layer: The layer for which the data are recorded.
387 :param records1_fname: Name of the records1 file.
388 :param records2_fname: Name of the records2 file.
389 :param records3_fname: Name of the records3 file.
391 path = basf2.create_path()
394 file_list = [fname
for sublist
in self.get_input_file_names().values()
395 for fname
in sublist
if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
396 path.add_module(
"RootInput", inputFileNames=file_list)
398 path.add_module(
"Gearbox")
399 path.add_module(
"Geometry")
400 path.add_module(
"SetupGenfitExtrapolation")
402 add_hit_preparation_modules(path, components=[
"SVD"])
404 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
406 path.add_module(
'TrackFinderMCTruthRecoTracks',
407 RecoTracksStoreArrayName=
"MCRecoTracks",
413 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
False, UseCDCHits=
True,
414 mcRecoTracksStoreArrayName=
"MCRecoTracks",
415 prRecoTracksStoreArrayName=
"CDCRecoTracks")
416 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCRecoTracks")
418 path.add_module(
"CDCToSVDSpacePointCKF",
419 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
420 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
421 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
423 relationCheckForDirection=
"backward",
425 writeOutDirection=
"backward",
427 firstHighFilter=
"truth",
428 firstEqualFilter=
"recording",
429 firstEqualFilterParameters={
"treeName":
"records1",
"rootFileName":
430 records1_fname,
"returnWeight": 1.0},
431 firstLowFilter=
"none",
432 firstHighUseNStates=0,
433 firstToggleOnLayer=layer,
435 advanceHighFilter=
"advance",
437 secondHighFilter=
"truth",
438 secondEqualFilter=
"recording",
439 secondEqualFilterParameters={
"treeName":
"records2",
"rootFileName":
440 records2_fname,
"returnWeight": 1.0},
441 secondLowFilter=
"none",
442 secondHighUseNStates=0,
443 secondToggleOnLayer=layer,
445 updateHighFilter=
"fit",
447 thirdHighFilter=
"truth",
448 thirdEqualFilter=
"recording",
449 thirdEqualFilterParameters={
"treeName":
"records3",
"rootFileName": records3_fname},
450 thirdLowFilter=
"none",
451 thirdHighUseNStates=0,
452 thirdToggleOnLayer=layer,
457 enableOverlapResolving=
False)
463 Create basf2 path to process with event generation and simulation.
466 layer=self.
layerlayer,
467 records1_fname=self.get_output_file_name(
"records1.root"),
468 records2_fname=self.get_output_file_name(
"records2.root"),
469 records3_fname=self.get_output_file_name(
"records3.root"),
475 A teacher task runs the basf2 mva teacher on the training data provided by a
476 data collection task.
478 In this task the three state filters are trained, each with the corresponding recordings from the different layers.
479 It will be executed for each FastBDT option defined in the MainTask.
482 experiment_number = b2luigi.IntParameter()
485 random_seed = b2luigi.Parameter()
487 n_events = b2luigi.IntParameter()
489 fast_bdt_option_state_filter = b2luigi.ListParameter(
491 hashed=
True, default=[50, 8, 3, 0.1]
495 filter_number = b2luigi.IntParameter()
497 training_target = b2luigi.Parameter(
504 exclude_variables = b2luigi.ListParameter(
507 hashed=
True, default=[
515 "seed_lowest_svd_layer",
516 "seed_lowest_cdc_layer",
517 "quality_index_triplet",
518 "quality_index_circle",
519 "quality_index_helix",
522 "mean_rest_cluster_charge",
523 "min_rest_cluster_charge",
524 "std_rest_cluster_charge",
525 "cluster_1_seed_charge",
526 "cluster_2_seed_charge",
527 "mean_rest_cluster_seed_charge",
528 "min_rest_cluster_seed_charge",
529 "std_rest_cluster_seed_charge",
532 "mean_rest_cluster_size",
533 "min_rest_cluster_size",
534 "std_rest_cluster_size",
537 "mean_rest_cluster_snr",
538 "min_rest_cluster_snr",
539 "std_rest_cluster_snr",
540 "cluster_1_charge_over_size",
541 "cluster_2_charge_over_size",
542 "mean_rest_cluster_charge_over_size",
543 "min_rest_cluster_charge_over_size",
544 "std_rest_cluster_charge_over_size",
551 Name of the xml weightfile that is created by the teacher task.
552 It is subsequently used as a local weightfile in the following validation tasks.
554 :param fast_bdt_option: FastBDT option that is used to train this MVA
555 :param filter_number: Filter number (first=1, second=2, third=3) to be trained
557 if fast_bdt_option
is None:
559 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
560 weightfile_name = f
"trk_CDCToSVDSpacePointStateFilter_{filter_number}" + fast_bdt_string
561 return weightfile_name +
".xml"
565 This task requires that the recordings for the state filters.
567 for layer
in [3, 4, 5, 6, 7]:
572 random_seed=
"training",
578 Generate list of output files that the task should produce.
579 The task is considered finished if and only if the outputs all exist.
585 Use basf2_mva teacher to create MVA weightfile from collected training
588 This is the main process that is dispatched by the ``run`` method that
589 is inherited from ``Basf2Task``.
591 records_files = self.get_input_file_names(f
"records{self.filter_number}.root")
592 tree_name = f
"records{self.filter_number}"
593 print(f
"Processed records files: {records_files=},\nfeature tree name: {tree_name=}")
595 my_basf2_mva_teacher(
596 records_files=records_files,
600 exclude_variables=self.exclude_variables,
607 Task to record data for the final result filter. This requires trained state filters.
608 The cuts on the state filter classifiers are set to rather low values to ensure that all signal is contained in the
609 recorded file. Also, the values for XXXXXHighUseNStates are chosen conservatively, i.e. rather on the high side.
613 experiment_number = b2luigi.IntParameter()
616 random_seed = b2luigi.Parameter()
618 n_events = b2luigi.IntParameter()
620 fast_bdt_option_state_filter = b2luigi.ListParameter(
622 hashed=
True, default=[50, 8, 3, 0.1]
626 result_filter_records_name = b2luigi.Parameter()
630 Generate list of output files that the task should produce.
631 The task is considered finished if and only if the outputs all exist.
637 This task requires that the training SplitMergeSimTask is finished, as well as that the state filters are trained
638 using the CKFStateFilterTeacherTask..
646 filter_numbers = [1, 2, 3]
647 for filter_number
in filter_numbers:
649 CKFStateFilterTeacherTask,
653 filter_number=filter_number,
659 Create a path for the recording of the result filter. This file is then used to train the result filter.
661 :param result_filter_records_name: Name of the recording file.
664 path = basf2.create_path()
667 file_list = [fname
for sublist
in self.get_input_file_names().values()
668 for fname
in sublist
if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
669 path.add_module(
"RootInput", inputFileNames=file_list)
671 path.add_module(
"Gearbox")
672 path.add_module(
"Geometry")
673 path.add_module(
"SetupGenfitExtrapolation")
675 add_hit_preparation_modules(path, components=[
"SVD"])
677 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
679 path.add_module(
'TrackFinderMCTruthRecoTracks',
680 RecoTracksStoreArrayName=
"MCRecoTracks",
686 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
False, UseCDCHits=
True,
687 mcRecoTracksStoreArrayName=
"MCRecoTracks",
688 prRecoTracksStoreArrayName=
"CDCRecoTracks")
689 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCRecoTracks")
692 path.add_module(
"CDCToSVDSpacePointCKF",
693 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
694 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
695 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
697 relationCheckForDirection=
"backward",
699 writeOutDirection=
"backward",
701 firstHighFilter=
"mva_with_direction_check",
702 firstHighFilterParameters={
703 "identifier": self.get_input_file_names(f
"trk_CDCToSVDSpacePointStateFilter_1{fast_bdt_string}.xml")[0],
705 "direction":
"backward"},
706 firstHighUseNStates=10,
708 advanceHighFilter=
"advance",
709 advanceHighFilterParameters={
"direction":
"backward"},
711 secondHighFilter=
"mva",
712 secondHighFilterParameters={
713 "identifier": self.get_input_file_names(f
"trk_CDCToSVDSpacePointStateFilter_2{fast_bdt_string}.xml")[0],
715 secondHighUseNStates=10,
717 updateHighFilter=
"fit",
719 thirdHighFilter=
"mva",
720 thirdHighFilterParameters={
721 "identifier": self.get_input_file_names(f
"trk_CDCToSVDSpacePointStateFilter_3{fast_bdt_string}.xml")[0],
723 thirdHighUseNStates=10,
726 filterParameters={
"rootFileName": result_filter_records_name},
729 enableOverlapResolving=
True)
735 Create basf2 path to process with event generation and simulation.
737 return self.create_result_recording_path(
744 A teacher task runs the basf2 mva teacher on the training data provided by a
745 data collection task.
747 Since teacher tasks are needed for all quality estimators covered by this
748 steering file and the only thing that changes is the required data
749 collection task and some training parameters, I decided to use inheritance
750 and have the basic functionality in this base class/interface and have the
751 specific teacher tasks inherit from it.
754 experiment_number = b2luigi.IntParameter()
757 random_seed = b2luigi.Parameter()
759 n_events = b2luigi.IntParameter()
761 fast_bdt_option_state_filter = b2luigi.ListParameter(
763 hashed=
True, default=[50, 8, 3, 0.1]
767 fast_bdt_option_result_filter = b2luigi.ListParameter(
769 hashed=
True, default=[200, 8, 3, 0.1]
773 result_filter_records_name = b2luigi.Parameter()
775 training_target = b2luigi.Parameter(
782 exclude_variables = b2luigi.ListParameter(
784 hashed=
True, default=[]
790 Name of the xml weightfile that is created by the teacher task.
791 It is subsequently used as a local weightfile in the following validation tasks.
793 :param fast_bdt_option: FastBDT option that is used to train this MVA
795 if fast_bdt_option
is None:
797 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
798 weightfile_name =
"trk_CDCToSVDSpacePointResultFilter" + fast_bdt_string
799 return weightfile_name +
".xml"
803 Generate list of luigi Tasks that this Task depends on.
815 Generate list of output files that the task should produce.
816 The task is considered finished if and only if the outputs all exist.
822 Use basf2_mva teacher to create MVA weightfile from collected training
825 This is the main process that is dispatched by the ``run`` method that
826 is inherited from ``Basf2Task``.
829 tree_name =
"records"
830 print(f
"Processed records files for result filter training: {records_files=},\nfeature tree name: {tree_name=}")
832 my_basf2_mva_teacher(
833 records_files=records_files,
835 weightfile_identifier=self.get_output_file_name(self.get_weightfile_xml_identifier()),
836 target_variable=self.training_target,
837 exclude_variables=self.exclude_variables,
844 Validate the performance of the trained filters by trying various combinations of FastBDT options, as well as cut values
845 for the states, the number of best candidates kept after each filter, and similar for the result filter.
848 experiment_number = b2luigi.IntParameter()
850 n_events_training = b2luigi.IntParameter()
852 fast_bdt_option_state_filter = b2luigi.ListParameter(
854 hashed=
True, default=[50, 8, 3, 0.1]
858 fast_bdt_option_result_filter = b2luigi.ListParameter(
860 hashed=
True, default=[200, 8, 3, 0.1]
864 n_events_testing = b2luigi.IntParameter()
866 state_filter_cut = b2luigi.FloatParameter()
868 use_n_best_states = b2luigi.IntParameter()
870 result_filter_cut = b2luigi.FloatParameter()
872 use_n_best_results = b2luigi.IntParameter()
876 Generate list of output files that the task should produce.
877 The task is considered finished if and only if the outputs all exist.
881 yield self.add_to_output(
882 f
"cdc_to_svd_spacepoint_ckf_validation{fbdt_state_filter_string}_{fbdt_result_filter_string}.root")
886 This task requires trained result filters, trained state filters, and that an independent data set for validation was
887 created using the SplitMergeSimTask with the random seed optimisation.
891 result_filter_records_name=f
"filter_records{fbdt_state_filter_string}.root",
896 random_seed=
'training'
902 random_seed=
"optimisation",
904 filter_numbers = [1, 2, 3]
905 for filter_number
in filter_numbers:
907 CKFStateFilterTeacherTask,
909 random_seed=
"training",
911 filter_number=filter_number,
917 Create a path to validate the trained filters.
919 path = basf2.create_path()
922 file_list = [fname
for sublist
in self.get_input_file_names().values()
923 for fname
in sublist
if "generated_mc_N" in fname
and "optimisation" in fname
and fname.endswith(
".root")]
924 path.add_module(
"RootInput", inputFileNames=file_list)
926 path.add_module(
"Gearbox")
927 path.add_module(
"Geometry")
928 path.add_module(
"SetupGenfitExtrapolation")
930 add_hit_preparation_modules(path, components=[
"SVD"])
932 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
936 path.add_module(
"CDCToSVDSpacePointCKF",
938 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
939 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
940 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
942 relationCheckForDirection=
"backward",
944 writeOutDirection=
"backward",
946 firstHighFilter=
"mva_with_direction_check",
947 firstHighFilterParameters={
948 "identifier": self.get_input_file_names(
949 f
"trk_CDCToSVDSpacePointStateFilter_1{fbdt_state_filter_string}.xml")[0],
951 "direction":
"backward"},
954 advanceHighFilter=
"advance",
955 advanceHighFilterParameters={
"direction":
"backward"},
957 secondHighFilter=
"mva",
958 secondHighFilterParameters={
959 "identifier": self.get_input_file_names(
960 f
"trk_CDCToSVDSpacePointStateFilter_2{fbdt_state_filter_string}.xml")[0],
964 updateHighFilter=
"fit",
966 thirdHighFilter=
"mva",
967 thirdHighFilterParameters={
968 "identifier": self.get_input_file_names(
969 f
"trk_CDCToSVDSpacePointStateFilter_3{fbdt_state_filter_string}.xml")[0],
975 "identifier": self.get_input_file_names(
976 f
"trk_CDCToSVDSpacePointResultFilter{fbdt_result_filter_string}.xml")[0],
981 enableOverlapResolving=
True)
983 path.add_module(
'RelatedTracksCombiner',
984 VXDRecoTracksStoreArrayName=
"VXDRecoTracks",
985 CDCRecoTracksStoreArrayName=
"CDCRecoTracks",
986 recoTracksStoreArrayName=
"RecoTracks")
988 path.add_module(
'TrackFinderMCTruthRecoTracks',
989 RecoTracksStoreArrayName=
"MCRecoTracks",
995 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
True, UseCDCHits=
True,
996 mcRecoTracksStoreArrayName=
"MCRecoTracks",
997 prRecoTracksStoreArrayName=
"RecoTracks")
1001 output_file_name=self.get_output_file_name(
1002 f
"cdc_to_svd_spacepoint_ckf_validation{fbdt_state_filter_string}_{fbdt_result_filter_string}.root"),
1003 reco_tracks_name=
"RecoTracks",
1004 mc_reco_tracks_name=
"MCRecoTracks",
1011 def create_path(self):
1013 Create basf2 path to process with event generation and simulation.
1015 return self.create_optimisation_and_validation_path()
1018 class MainTask(b2luigi.WrapperTask):
1020 Wrapper task that needs to finish for b2luigi to finish running this steering file.
1022 It is done if the outputs of all required subtasks exist. It is thus at the
1023 top of the luigi task graph. Edit the ``requires`` method to steer which
1024 tasks and with which parameters you want to run.
1027 n_events_training = b2luigi.get_setting(
1029 "n_events_training", default=1000
1033 n_events_testing = b2luigi.get_setting(
1035 "n_events_testing", default=500
1039 n_events_per_task = b2luigi.get_setting(
1041 "n_events_per_task", default=100
1045 num_processes = b2luigi.get_setting(
1047 "basf2_processes_per_worker", default=0
1052 bkgfiles_by_exp = b2luigi.get_setting(
"bkgfiles_by_exp")
1054 bkgfiles_by_exp = {int(key): val
for (key, val)
in bkgfiles_by_exp.items()}
1058 Generate list of tasks that needs to be done for luigi to finish running
1062 fast_bdt_options = [
1068 experiment_numbers = b2luigi.get_setting(
"experiment_numbers")
1071 for experiment_number, fast_bdt_option_state_filter, fast_bdt_option_result_filter
in itertools.product(
1072 experiment_numbers, fast_bdt_options, fast_bdt_options
1075 state_filter_cuts = [0.01, 0.02, 0.03, 0.05, 0.1, 0.2]
1076 n_best_states_list = [3, 5, 10]
1077 result_filter_cuts = [0.05, 0.1, 0.2]
1078 n_best_results_list = [3, 5, 10]
1079 for state_filter_cut, n_best_states, result_filter_cut, n_best_results
in \
1080 itertools.product(state_filter_cuts, n_best_states_list, result_filter_cuts, n_best_results_list):
1082 ValidationAndOptimisationTask,
1083 experiment_number=experiment_number,
1086 state_filter_cut=state_filter_cut,
1087 use_n_best_states=n_best_states,
1088 result_filter_cut=result_filter_cut,
1089 use_n_best_results=n_best_results,
1090 fast_bdt_option_state_filter=fast_bdt_option_state_filter,
1091 fast_bdt_option_result_filter=fast_bdt_option_result_filter,
1095 if __name__ ==
"__main__":
1096 b2luigi.set_setting(
"env_script",
"./setup_basf2.sh")
1097 b2luigi.set_setting(
"batch_system",
"htcondor")
1098 workers = b2luigi.get_setting(
"workers", default=1)
1099 b2luigi.process(
MainTask(), workers=workers, batch=
True)
def get_background_files(folder=None, output_file_info=True)
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the input file name.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
random_seed
Random basf2 seed.
def get_weightfile_xml_identifier(self, fast_bdt_option=None)
fast_bdt_option_result_filter
Hyperparameter option of the FastBDT algorithm.
def get_weightfile_xml_identifier(self, fast_bdt_option=None, filter_number=1)
experiment_number
Experiment number of the conditions database, e.g.
filter_number
Number of the filter for which the records files are to be processed.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
experiment_number
Experiment number of the conditions database, e.g.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
random_seed
Random basf2 seed.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
def create_result_recording_path(self, result_filter_records_name)
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the records file for training the final result filter.
n_events
Number of events to generate for the training data set.
fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
random_seed
Random basf2 seed.
experiment_number
Experiment number of the conditions database, e.g.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
random_seed
Random basf2 seed.
experiment_number
Experiment number of the conditions database, e.g.
n_events
Number of events to generate for training.
def create_state_recording_path(self, layer, records1_fname, records2_fname, records3_fname)
layer
Layer on which to toggle for recording the information for training.
random_seed
Random basf2 seed.
experiment_number
Experiment number of the conditions database, e.g.
use_n_best_results
How many results should be kept at maximum to search for overlaps.
state_filter_cut
Value of the cut on the MVA classifier output for accepting a state during CKF tracking.
def create_optimisation_and_validation_path(self)
result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
use_n_best_states
How many states should be kept at maximum in the combinatorial part of the CKF tree search.
n_events_training
Number of events to generate for the training data set.
fast_bdt_option_state_filter
FastBDT option to use to train the StateFilters.
n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
fast_bdt_option_result_filter
FastBDT option to use to train the Result Filter.
def add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False)