10combined_cdc_to_svd_ckf_mva_training
11-----------------------------------------
16This python script is used for the training and validation of the classifiers of
17the three MVA-based state filters and one result filter of the CDCToSVDSpacePointCKF.
18This CKF extraplates tracks found in the CDC into the SVD and adds SVD hits using a
19combinatorial tree search and a Kalman filter based track fit in each step.
21To avoid mistakes, b2luigi is used to create a task chain for a combined training and
22validation of all classifiers.
24The order of the b2luigi tasks in this script is as follows (top to bottom):
25* Two tasks to create input samples for training and testing (``GenerateSimTask`` and
26``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
27generated and a number of events per task to reduce runtime. It then divides the total
28number of events by the number of events per task and creates as ``GenerateSimTask`` as
29needed, each with a specific random seed, so that in the end the total number of
30training and testing events are simulated. The individual files are then combined
31by the SplitNMergeSimTask into one file each for training and testing.
32* The ``StateRecordingTask`` writes out the data required for training the state
34* The ``CKFStateFilterTeacherTask`` trains the state filter MVAs, using FastBDT by
35default, with a given set of options.
36* The ``ResultRecordingTask`` writes out the data used for the training of the result
37filter MVA. This task requires that the state filters have been trained before.
38* The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
39given set of FastBDT options. This requires that the result filter records have
40been created with the ``ResultRecordingTask``.
41* The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
42provided to run the tracking chain with the weight file under test, and also
43runs the tracking validation.
44* Finally, the ``MainTask`` is the "brain" of the script. It invokes the
45``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
46and cut values on the MVA classifier output.
48Due to the dependencies, the calls of the task are reversed. The MainTask
49calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
50values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
51training, and simulation tasks.
53Each combination of FastBDT options and state filter cut values and candidate selection
54is used to train the result filter, which includes that the ``ResultRecordingTask``
55is executed multiple times with different combinations of FastBDT options and cut value
56and candidate selection.
58b2luigi: Understanding the steering file
59~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61All trainings and validations are done in the correct order in this steering
62file. For the purpose of creating a dependency graph, the `b2luigi
63<https://b2luigi.readthedocs.io>`_ python package is used, which extends the
64`luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
66Each task that has to be done is represented by a special class, which defines
67which defines parameters, output files and which other tasks with which
68parameters it depends on. For example a teacher task, which runs
69``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
70task which runs a reconstruction and writes out track-wise variables into a root
71file
for training. An evaluation/validation task
for testing the classifier
72requires both the teacher task,
as it needs the weightfile to be present,
and
73also a data collection task, because it needs a dataset
for testing classifier.
75The final task that defines which tasks need to be done
for the steering file to
76finish
is the ``MainTask``. When you only want to run parts of the
77training/validation pipeline, you can comment out requirements
in the Master
78task
or replace them by lower-level tasks during debugging.
83This steering file relies on b2luigi_
for task scheduling. It can be installed
86 python3 -m pip install [--user] b2luigi
88Use the ``--user`` option
if you have
not rights to install python packages into
89your externals (e.g. because you are using cvmfs)
and install them
in
90``$HOME/.local`` instead.
95Instead of command line arguments, the b2luigi script
is configured via a
96``settings.json`` file. Open it
in your favorite text editor
and modify it to
97fit to your requirements.
102You can test the b2luigi without running it via::
104 python3 combined_cdc_to_svd_ckf_mva_training.py --dry-run
105 python3 combined_cdc_to_svd_ckf_mva_training.py --show-output
107This will show the outputs
and show potential errors
in the definitions of the
108luigi task dependencies. To run the the steering file
in normal (local) mode,
111 python3 combined_cdc_to_svd_ckf_mva_training.py
113One can use the interactive luigi web interface via the central scheduler
114which visualizes the task graph
while it
is running. Therefore, the scheduler
115daemon ``luigid`` has to run
in the background, which
is located
in
116``~/.local/bin/luigid``
in case b2luigi had been installed
with ``--user``. For
121Then, execute your steering (e.g.
in another terminal)
with::
123 python3 combined_cdc_to_svd_ckf_mva_training.py --scheduler-port 8886
125To view the web interface, open your webbrowser enter into the url bar::
129If you don
't run the steering file on the same machine on which you run your webbrowser, you have two options:
131 1. Run both the steering file and ``luigid`` remotely
and use
132 ssh-port-forwarding to your local host. Therefore, run on your local
135 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
137 2. Run the ``luigid`` scheduler locally
and use the ``--scheduler-host <your
138 local host>`` argument when calling the steering file
140Accessing the results / output files
141~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143All output files are stored
in a directory structure
in the ``result_path`` set
in
144``settings.json``. The directory tree encodes the used b2luigi parameters. This
145ensures reproducibility
and makes parameter searches easy. Sometimes, it
is hard to
146find the relevant output files. You can view the whole directory structure by
147running ``tree <result_path>``. Ise the unix ``find`` command to find the files
148that interest you, e.g.::
150 find <result_path> -name
"*.root"
159from tracking import add_track_finding
160from tracking.path_utils import add_hit_preparation_modules
161from tracking.harvesting_validation.combined_module import CombinedTrackingValidationModule
165from ckf_training import my_basf2_mva_teacher, create_fbdt_option_string
166from tracking_mva_filter_payloads.write_tracking_mva_filter_payloads_to_db import write_tracking_mva_filter_payloads_to_db
168# wrap python modules that are used here but not in the externals into a try except block
169install_helpstring_formatter = ("\nCould not find {module} python module.Try installing it via\n"
170 " python3 -m pip install [--user] {module}\n")
173 from b2luigi.core.utils
import create_output_dirs
174 from b2luigi.basf2_helper
import Basf2PathTask, Basf2Task
175except ModuleNotFoundError:
176 print(install_helpstring_formatter.format(module=
"b2luigi"))
180class GenerateSimTask(Basf2PathTask):
182 Generate simulated Monte Carlo with background overlay.
184 Make sure to use different ``random_seed`` parameters
for the training data
185 format the classifier trainings
and for the test data
for the respective
186 evaluation/validation tasks.
190 experiment_number = b2luigi.IntParameter()
193 random_seed = b2luigi.Parameter()
195 n_events = b2luigi.IntParameter()
197 bkgfiles_dir = b2luigi.Parameter(
208 Create output file name depending on number of events and production
209 mode that
is specified
in the random_seed string.
211 :param n_events: Number of events to simulate.
212 :param random_seed: Random seed to use
for the simulation to create independent samples.
216 if random_seed
is None:
218 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
222 Generate list of output files that the task should produce.
223 The task is considered finished
if and only
if the outputs all exist.
227 def create_path(self):
229 Create basf2 path to process with event generation
and simulation.
232 path = basf2.create_path()
236 path.add_module(
"EvtGenInput")
258 Generate simulated Monte Carlo with background overlay.
260 Make sure to use different ``random_seed`` parameters
for the training data
261 format the classifier trainings
and for the test data
for the respective
262 evaluation/validation tasks.
265 experiment_number = b2luigi.IntParameter()
268 random_seed = b2luigi.Parameter()
270 n_events = b2luigi.IntParameter()
272 bkgfiles_dir = b2luigi.Parameter(
283 Create output file name depending on number of events and production
284 mode that
is specified
in the random_seed string.
286 :param n_events: Number of events to simulate.
287 :param random_seed: Random seed to use
for the simulation to create independent samples.
291 if random_seed
is None:
293 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
297 Generate list of output files that the task should produce.
298 The task is considered finished
if and only
if the outputs all exist.
304 This task requires several GenerateSimTask to be finished so that he required number of events is created.
306 n_events_per_task = MainTask.n_events_per_task
307 quotient, remainder = divmod(self.n_events, n_events_per_task)
308 for i
in range(quotient):
311 num_processes=MainTask.num_processes,
312 random_seed=self.
random_seed +
'_' + str(i).zfill(3),
313 n_events=n_events_per_task,
319 num_processes=MainTask.num_processes,
320 random_seed=self.
random_seed +
'_' + str(quotient).zfill(3),
325 @b2luigi.on_temporary_files
328 When all GenerateSimTasks finished, merge the output.
330 create_output_dirs(self)
332 file_list = [item for sublist
in self.get_input_file_names().values()
for item
in sublist]
333 print(
"Merge the following files:")
335 cmd = [
"b2file-merge",
"-f"]
336 args = cmd + [self.get_output_file_name(self.
output_file_name())] + file_list
337 subprocess.check_call(args)
338 print(
"Finished merging. Now remove the input files to save space.")
339 for input_file
in file_list:
341 os.remove(input_file)
342 except FileNotFoundError:
348 Record the data for the three state filters
for the CDCToSVDSpacePointCKF.
350 This task requires that the events used
for training have been simulated before, which
is done using the
351 ``SplitMergeSimTask``.
354 experiment_number = b2luigi.IntParameter()
357 random_seed = b2luigi.Parameter()
359 n_events = b2luigi.IntParameter()
362 layer = b2luigi.IntParameter()
366 Generate list of output files that the task should produce.
367 The task is considered finished
if and only
if the outputs all exist.
369 for record_fname
in [
"records1.root",
"records2.root",
"records3.root"]:
370 yield self.add_to_output(record_fname)
374 This task only requires that the input files have been created.
385 Create a path for the recording. To record the data
for the SVD state filters, CDC tracks are required,
and these must
386 be truth matched before. The data have to recorded
for each layer of the SVD, i.e. layers 3 to 6, but also an artificial
389 :param layer: The layer
for which the data are recorded.
390 :param records1_fname: Name of the records1 file.
391 :param records2_fname: Name of the records2 file.
392 :param records3_fname: Name of the records3 file.
394 path = basf2.create_path()
397 file_list = [fname
for sublist
in self.get_input_file_names().values()
398 for fname
in sublist
if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
399 path.add_module(
"RootInput", inputFileNames=file_list)
401 path.add_module(
"Gearbox")
402 path.add_module(
"Geometry")
403 path.add_module(
"SetupGenfitExtrapolation")
405 add_hit_preparation_modules(path, components=[
"SVD"])
407 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
409 path.add_module(
'TrackFinderMCTruthRecoTracks',
410 RecoTracksStoreArrayName=
"MCRecoTracks",
416 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
False, UseCDCHits=
True,
417 mcRecoTracksStoreArrayName=
"MCRecoTracks",
418 prRecoTracksStoreArrayName=
"CDCRecoTracks")
419 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCRecoTracks")
421 path.add_module(
"CDCToSVDSpacePointCKF",
422 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
423 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
424 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
426 relationCheckForDirection=
"backward",
428 writeOutDirection=
"backward",
430 firstHighFilter=
"truth",
431 firstEqualFilter=
"recording",
432 firstEqualFilterParameters={
"treeName":
"records1",
"rootFileName":
433 records1_fname,
"returnWeight": 1.0},
434 firstLowFilter=
"none",
435 firstHighUseNStates=0,
436 firstToggleOnLayer=layer,
438 advanceHighFilter=
"advance",
440 secondHighFilter=
"truth",
441 secondEqualFilter=
"recording",
442 secondEqualFilterParameters={
"treeName":
"records2",
"rootFileName":
443 records2_fname,
"returnWeight": 1.0},
444 secondLowFilter=
"none",
445 secondHighUseNStates=0,
446 secondToggleOnLayer=layer,
448 updateHighFilter=
"fit",
450 thirdHighFilter=
"truth",
451 thirdEqualFilter=
"recording",
452 thirdEqualFilterParameters={
"treeName":
"records3",
"rootFileName": records3_fname},
453 thirdLowFilter=
"none",
454 thirdHighUseNStates=0,
455 thirdToggleOnLayer=layer,
460 enableOverlapResolving=
False)
464 def create_path(self):
466 Create basf2 path to process with event generation
and simulation.
470 records1_fname=self.get_output_file_name(
"records1.root"),
471 records2_fname=self.get_output_file_name(
"records2.root"),
472 records3_fname=self.get_output_file_name(
"records3.root"),
478 A teacher task runs the basf2 mva teacher on the training data provided by a
479 data collection task.
481 In this task the three state filters are trained, each with the corresponding recordings
from the different layers.
482 It will be executed
for each FastBDT option defined
in the MainTask.
485 experiment_number = b2luigi.IntParameter()
488 random_seed = b2luigi.Parameter()
490 n_events = b2luigi.IntParameter()
492 fast_bdt_option_state_filter = b2luigi.ListParameter(
494 hashed=True, default=[50, 8, 3, 0.1]
498 filter_number = b2luigi.IntParameter()
500 training_target = b2luigi.Parameter(
507 exclude_variables = b2luigi.ListParameter(
510 hashed=
True, default=[
518 "seed_lowest_svd_layer",
519 "seed_lowest_cdc_layer",
520 "quality_index_triplet",
521 "quality_index_circle",
522 "quality_index_helix",
525 "mean_rest_cluster_charge",
526 "min_rest_cluster_charge",
527 "std_rest_cluster_charge",
528 "cluster_1_seed_charge",
529 "cluster_2_seed_charge",
530 "mean_rest_cluster_seed_charge",
531 "min_rest_cluster_seed_charge",
532 "std_rest_cluster_seed_charge",
535 "mean_rest_cluster_size",
536 "min_rest_cluster_size",
537 "std_rest_cluster_size",
540 "mean_rest_cluster_snr",
541 "min_rest_cluster_snr",
542 "std_rest_cluster_snr",
543 "cluster_1_charge_over_size",
544 "cluster_2_charge_over_size",
545 "mean_rest_cluster_charge_over_size",
546 "min_rest_cluster_charge_over_size",
547 "std_rest_cluster_charge_over_size",
554 Name of weightfile that is created by the teacher task.
556 :param fast_bdt_option: FastBDT option that
is used to train this MVA
557 :param filter_number: Filter number (first=1, second=2, third=3) to be trained
560 if fast_bdt_option
is None:
562 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
563 if filter_number
is None:
565 weightfile_name = f
"trk_CDCToSVDSpacePointStateFilter_{filter_number}" + fast_bdt_string
566 return weightfile_name
570 This task requires that the recordings for the state filters.
572 for layer
in [3, 4, 5, 6, 7]:
577 random_seed=
"training",
583 Generate list of output files that the task should produce.
584 The task is considered finished
if and only
if the outputs all exist.
590 Use basf2_mva teacher to create MVA weightfile from collected training
593 This
is the main process that
is dispatched by the ``run`` method that
594 is inherited
from ``Basf2Task``.
596 records_files = self.get_input_file_names(f"records{self.filter_number}.root")
598 tree_name = f
"records{self.filter_number}"
599 print(f
"Processed records files: {records_files},\nfeature tree name: {tree_name}")
601 my_basf2_mva_teacher(
602 records_files=records_files,
604 weightfile_identifier=weightfile_identifier,
609 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier +
".root"))
614 Task to record data for the final result filter. This requires trained state filters.
615 The cuts on the state filter classifiers are set to rather low values to ensure that all signal
is contained
in the
616 recorded file. Also, the values
for XXXXXHighUseNStates are chosen conservatively, i.e. rather on the high side.
620 experiment_number = b2luigi.IntParameter()
623 random_seed = b2luigi.Parameter()
625 n_events = b2luigi.IntParameter()
627 fast_bdt_option_state_filter = b2luigi.ListParameter(
629 hashed=True, default=[50, 8, 3, 0.1]
633 result_filter_records_name = b2luigi.Parameter()
636 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
640 Generate list of output files that the task should produce.
641 The task is considered finished
if and only
if the outputs all exist.
647 This task requires that the training SplitMergeSimTask is finished,
as well
as that the state filters are trained
648 using the CKFStateFilterTeacherTask..
656 filter_numbers = [1, 2, 3]
657 for filter_number
in filter_numbers:
659 CKFStateFilterTeacherTask,
663 filter_number=filter_number,
669 Create a path for the recording of the result filter. This file
is then used to train the result filter.
671 :param result_filter_records_name: Name of the recording file.
674 path = basf2.create_path()
677 file_list = [fname
for sublist
in self.get_input_file_names().values()
678 for fname
in sublist
if "generated_mc_N" in fname
and "training" in fname
and fname.endswith(
".root")]
679 path.add_module(
"RootInput", inputFileNames=file_list)
681 path.add_module(
"Gearbox")
682 path.add_module(
"Geometry")
683 path.add_module(
"SetupGenfitExtrapolation")
685 add_hit_preparation_modules(path, components=[
"SVD"])
687 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
689 path.add_module(
'TrackFinderMCTruthRecoTracks',
690 RecoTracksStoreArrayName=
"MCRecoTracks",
696 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
False, UseCDCHits=
True,
697 mcRecoTracksStoreArrayName=
"MCRecoTracks",
698 prRecoTracksStoreArrayName=
"CDCRecoTracks")
699 path.add_module(
"DAFRecoFitter", recoTracksStoreArrayName=
"CDCRecoTracks")
705 f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fast_bdt_string}",
707 f
"trk_CDCToSVDSpacePointStateFilter_1{fast_bdt_string}",
711 f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fast_bdt_string}",
713 f
"trk_CDCToSVDSpacePointStateFilter_2{fast_bdt_string}",
717 f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fast_bdt_string}",
719 f
"trk_CDCToSVDSpacePointStateFilter_3{fast_bdt_string}",
722 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
723 first_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fast_bdt_string}",
724 "direction":
"backward"}
725 second_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fast_bdt_string}"}
726 third_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fast_bdt_string}"}
728 path.add_module(
"CDCToSVDSpacePointCKF",
729 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
730 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
731 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
733 relationCheckForDirection=
"backward",
735 writeOutDirection=
"backward",
737 firstHighFilter=
"mva_with_direction_check",
738 firstHighFilterParameters=first_high_filter_parameters,
739 firstHighUseNStates=10,
741 advanceHighFilter=
"advance",
742 advanceHighFilterParameters={
"direction":
"backward"},
744 secondHighFilter=
"mva",
745 secondHighFilterParameters=second_high_filter_parameters,
746 secondHighUseNStates=10,
748 updateHighFilter=
"fit",
750 thirdHighFilter=
"mva",
751 thirdHighFilterParameters=third_high_filter_parameters,
752 thirdHighUseNStates=10,
755 filterParameters={
"rootFileName": result_filter_records_name},
758 enableOverlapResolving=
True)
762 def create_path(self):
764 Create basf2 path to process with event generation
and simulation.
773 A teacher task runs the basf2 mva teacher on the training data provided by a
774 data collection task.
776 Since teacher tasks are needed for all quality estimators covered by this
777 steering file
and the only thing that changes
is the required data
778 collection task
and some training parameters, I decided to use inheritance
779 and have the basic functionality
in this base
class/interface
and have the
780 specific teacher tasks inherit
from it.
783 experiment_number = b2luigi.IntParameter()
786 random_seed = b2luigi.Parameter()
788 n_events = b2luigi.IntParameter()
790 fast_bdt_option_state_filter = b2luigi.ListParameter(
792 hashed=True, default=[50, 8, 3, 0.1]
796 fast_bdt_option_result_filter = b2luigi.ListParameter(
798 hashed=
True, default=[200, 8, 3, 0.1]
802 result_filter_records_name = b2luigi.Parameter()
804 training_target = b2luigi.Parameter(
811 exclude_variables = b2luigi.ListParameter(
813 hashed=
True, default=[]
819 Name of weightfile that is created by the teacher task.
821 :param fast_bdt_option: FastBDT option that
is used to train this MVA
823 if fast_bdt_option
is None:
825 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
826 weightfile_name =
"trk_CDCToSVDSpacePointResultFilter" + fast_bdt_string
827 return weightfile_name
831 Generate list of luigi Tasks that this Task depends on.
843 Generate list of output files that the task should produce.
844 The task is considered finished
if and only
if the outputs all exist.
850 Use basf2_mva teacher to create MVA weightfile from collected training
853 This
is the main process that
is dispatched by the ``run`` method that
854 is inherited
from ``Basf2Task``.
857 tree_name = "records"
858 print(f
"Processed records files for result filter training: {records_files},\nfeature tree name: {tree_name}")
860 my_basf2_mva_teacher(
861 records_files=records_files,
863 weightfile_identifier=weightfile_identifier,
869 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier +
".root"))
874 Validate the performance of the trained filters by trying various combinations of FastBDT options, as well
as cut values
875 for the states, the number of best candidates kept after each filter,
and similar
for the result filter.
878 experiment_number = b2luigi.IntParameter()
880 n_events_training = b2luigi.IntParameter()
882 fast_bdt_option_state_filter = b2luigi.ListParameter(
884 hashed=
True, default=[50, 8, 3, 0.1]
888 fast_bdt_option_result_filter = b2luigi.ListParameter(
890 hashed=
True, default=[200, 8, 3, 0.1]
894 n_events_testing = b2luigi.IntParameter()
896 state_filter_cut = b2luigi.FloatParameter()
898 use_n_best_states = b2luigi.IntParameter()
900 result_filter_cut = b2luigi.FloatParameter()
902 use_n_best_results = b2luigi.IntParameter()
905 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
909 Generate list of output files that the task should produce.
910 The task is considered finished
if and only
if the outputs all exist.
914 yield self.add_to_output(
915 f
"cdc_to_svd_spacepoint_ckf_validation{fbdt_state_filter_string}_{fbdt_result_filter_string}.root")
919 This task requires trained result filters, trained state filters, and that an independent data set
for validation was
920 created using the SplitMergeSimTask
with the random seed optimisation.
924 result_filter_records_name=f
"filter_records{fbdt_state_filter_string}.root",
929 random_seed=
'training'
935 random_seed=
"optimisation",
937 filter_numbers = [1, 2, 3]
938 for filter_number
in filter_numbers:
940 CKFStateFilterTeacherTask,
942 random_seed=
"training",
944 filter_number=filter_number,
950 Create a path to validate the trained filters.
952 path = basf2.create_path()
955 file_list = [fname
for sublist
in self.get_input_file_names().values()
956 for fname
in sublist
if "generated_mc_N" in fname
and "optimisation" in fname
and fname.endswith(
".root")]
957 path.add_module(
"RootInput", inputFileNames=file_list)
959 path.add_module(
"Gearbox")
960 path.add_module(
"Geometry")
961 path.add_module(
"SetupGenfitExtrapolation")
963 add_hit_preparation_modules(path, components=[
"SVD"])
965 add_track_finding(path, reco_tracks=
"CDCRecoTracks", components=[
"CDC"], prune_temporary_tracks=
False)
973 f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fbdt_state_filter_string}",
975 f
"trk_CDCToSVDSpacePointStateFilter_1{fbdt_state_filter_string}",
979 f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fbdt_state_filter_string}",
981 f
"trk_CDCToSVDSpacePointStateFilter_2{fbdt_state_filter_string}",
985 f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fbdt_state_filter_string}",
987 f
"trk_CDCToSVDSpacePointStateFilter_3{fbdt_state_filter_string}",
991 f
"trk_CDCToSVDSpacePointResultFilter_Parameter{fbdt_result_filter_string}",
993 f
"trk_CDCToSVDSpacePointResultFilter{fbdt_result_filter_string}",
996 basf2.conditions.prepend_testing_payloads(
"localdb/database.txt")
997 first_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_1_Parameter{fbdt_state_filter_string}",
998 "direction":
"backward"}
999 second_high_filter_parameters = {
1000 "DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_2_Parameter{fbdt_state_filter_string}"}
1001 third_high_filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointStateFilter_3_Parameter{fbdt_state_filter_string}"}
1002 filter_parameters = {
"DBPayloadName": f
"trk_CDCToSVDSpacePointResultFilter_Parameter{fbdt_result_filter_string}"}
1004 path.add_module(
"CDCToSVDSpacePointCKF",
1006 inputRecoTrackStoreArrayName=
"CDCRecoTracks",
1007 outputRecoTrackStoreArrayName=
"VXDRecoTracks",
1008 outputRelationRecoTrackStoreArrayName=
"CDCRecoTracks",
1010 relationCheckForDirection=
"backward",
1012 writeOutDirection=
"backward",
1014 firstHighFilter=
"mva_with_direction_check",
1015 firstHighFilterParameters=first_high_filter_parameters,
1018 advanceHighFilter=
"advance",
1019 advanceHighFilterParameters={
"direction":
"backward"},
1021 secondHighFilter=
"mva",
1022 secondHighFilterParameters=second_high_filter_parameters,
1025 updateHighFilter=
"fit",
1027 thirdHighFilter=
"mva",
1028 thirdHighFilterParameters=third_high_filter_parameters,
1032 filterParameters=filter_parameters,
1036 enableOverlapResolving=
True)
1038 path.add_module(
'RelatedTracksCombiner',
1039 VXDRecoTracksStoreArrayName=
"VXDRecoTracks",
1040 CDCRecoTracksStoreArrayName=
"CDCRecoTracks",
1041 recoTracksStoreArrayName=
"RecoTracks")
1043 path.add_module(
'TrackFinderMCTruthRecoTracks',
1044 RecoTracksStoreArrayName=
"MCRecoTracks",
1050 path.add_module(
"MCRecoTracksMatcher", UsePXDHits=
False, UseSVDHits=
True, UseCDCHits=
True,
1051 mcRecoTracksStoreArrayName=
"MCRecoTracks",
1052 prRecoTracksStoreArrayName=
"RecoTracks")
1056 output_file_name=self.get_output_file_name(
1057 f
"cdc_to_svd_spacepoint_ckf_validation{fbdt_state_filter_string}_{fbdt_result_filter_string}.root"),
1058 reco_tracks_name=
"RecoTracks",
1059 mc_reco_tracks_name=
"MCRecoTracks",
1066 def create_path(self):
1068 Create basf2 path to process with event generation
and simulation.
1073class MainTask(b2luigi.WrapperTask):
1075 Wrapper task that needs to finish for b2luigi to finish running this steering file.
1077 It
is done
if the outputs of all required subtasks exist. It
is thus at the
1078 top of the luigi task graph. Edit the ``requires`` method to steer which
1079 tasks
and with which parameters you want to run.
1082 n_events_training = b2luigi.get_setting(
1084 "n_events_training", default=1000
1088 n_events_testing = b2luigi.get_setting(
1090 "n_events_testing", default=500
1094 n_events_per_task = b2luigi.get_setting(
1096 "n_events_per_task", default=100
1100 num_processes = b2luigi.get_setting(
1102 "basf2_processes_per_worker", default=0
1107 bkgfiles_by_exp = b2luigi.get_setting(
"bkgfiles_by_exp")
1109 bkgfiles_by_exp = {int(key): val
for (key, val)
in bkgfiles_by_exp.items()}
1113 Generate list of tasks that needs to be done for luigi to finish running
1117 fast_bdt_options = [
1123 experiment_numbers = b2luigi.get_setting("experiment_numbers")
1126 for experiment_number, fast_bdt_option_state_filter, fast_bdt_option_result_filter
in itertools.product(
1127 experiment_numbers, fast_bdt_options, fast_bdt_options
1130 state_filter_cuts = [0.01, 0.02, 0.03, 0.05, 0.1, 0.2]
1131 n_best_states_list = [3, 5, 10]
1132 result_filter_cuts = [0.05, 0.1, 0.2]
1133 n_best_results_list = [3, 5, 10]
1134 for state_filter_cut, n_best_states, result_filter_cut, n_best_results
in \
1135 itertools.product(state_filter_cuts, n_best_states_list, result_filter_cuts, n_best_results_list):
1137 ValidationAndOptimisationTask,
1138 experiment_number=experiment_number,
1141 state_filter_cut=state_filter_cut,
1142 use_n_best_states=n_best_states,
1143 result_filter_cut=result_filter_cut,
1144 use_n_best_results=n_best_results,
1145 fast_bdt_option_state_filter=fast_bdt_option_state_filter,
1146 fast_bdt_option_result_filter=fast_bdt_option_result_filter,
1150if __name__ ==
"__main__":
1152 b2luigi.set_setting(
"env_script",
"./setup_basf2.sh")
1153 b2luigi.get_setting(
"batch_system",
"lsf")
1154 workers = b2luigi.get_setting(
"workers", default=1)
1155 b2luigi.process(
MainTask(), workers=workers, batch=
True)
def get_background_files(folder=None, output_file_info=True)
b2luigi random_seed
Random basf2 seed.
b2luigi training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
b2luigi fast_bdt_option_result_filter
Hyperparameter option of the FastBDT algorithm.
b2luigi n_events
Number of events to generate for the training data set.
b2luigi fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
def get_weightfile_identifier(self, fast_bdt_option=None)
b2luigi exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
b2luigi result_filter_records_name
Name of the input file name.
b2luigi training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
def get_weightfile_identifier(self, fast_bdt_option=None, filter_number=None)
b2luigi filter_number
Number of the filter for which the records files are to be processed.
b2luigi n_events
Number of events to generate for the training data set.
b2luigi fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
b2luigi random_seed
Random basf2 seed.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
b2luigi bkgfiles_dir
Directory with overlay background root files.
b2luigi n_events
Number of events to generate.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi n_events_testing
Number of events to generate for the test data set.
def create_result_recording_path(self, result_filter_records_name)
b2luigi random_seed
Random basf2 seed.
b2luigi n_events
Number of events to generate for the training data set.
b2luigi fast_bdt_option_state_filter
Hyperparameter option of the FastBDT algorithm.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi result_filter_records_name
Name of the records file for training the final result filter.
b2luigi random_seed
Random basf2 seed.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
b2luigi bkgfiles_dir
Directory with overlay background root files.
b2luigi n_events
Number of events to generate.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi random_seed
Random basf2 seed.
def create_state_recording_path(self, layer, records1_fname, records2_fname, records3_fname)
b2luigi n_events
Number of events to generate for training.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi layer
Layer on which to toggle for recording the information for training.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
b2luigi use_n_best_states
How many states should be kept at maximum in the combinatorial part of the CKF tree search.
b2luigi use_n_best_results
How many results should be kept at maximum to search for overlaps.
b2luigi state_filter_cut
Value of the cut on the MVA classifier output for accepting a state during CKF tracking.
b2luigi fast_bdt_option_result_filter
FastBDT option to use to train the Result Filter.
def create_optimisation_and_validation_path(self)
b2luigi result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
b2luigi fast_bdt_option_state_filter
FastBDT option to use to train the StateFilters.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
def add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False)