Belle II Software development
cdc_and_svd_ckf_merger_mva_training.py
1
8
9"""
10cdc_and_svd_ckf_merger_mva_training
11-----------------------------------------
12
13Purpose of this script
14~~~~~~~~~~~~~~~~~~~~~~
15
16This python script is used for the training and validation of the classifier of
17the MVA-based result filter of the CDCToSVDSeedCKF, which combines tracks that
18were found by the CDC and SVD standalone tracking algorithms.
19
20To avoid mistakes, b2luigi is used to create a task chain for a combined training and
21validation of all classifiers.
22
23The order of the b2luigi tasks in this script is as follows (top to bottom):
24* Two tasks to create input samples for training and testing (``GenerateSimTask`` and
25``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
26generated and a number of events per task to reduce runtime. It then divides the total
27number of events by the number of events per task and creates as ``GenerateSimTask`` as
28needed, each with a specific random seed, so that in the end the total number of
29training and testing events are simulated. The individual files are then combined
30by the SplitNMergeSimTask into one file each for training and testing.
31* The ``ResultRecordingTask`` writes out the data used for training of the MVA.
32* The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
33given set of FastBDT options.
34* The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
35provided to run the tracking chain with the weight file under test, and also
36runs the tracking validation.
37* Finally, the ``MainTask`` is the "brain" of the script. It invokes the
38``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
39and cut values on the MVA classifier output.
40
41Due to the dependencies, the calls of the task are reversed. The MainTask
42calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
43values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
44training, and simulation tasks.
45
46b2luigi: Understanding the steering file
47~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48
49All trainings and validations are done in the correct order in this steering
50file. For the purpose of creating a dependency graph, the `b2luigi
51<https://b2luigi.readthedocs.io>`_ python package is used, which extends the
52`luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
53
54Each task that has to be done is represented by a special class, which defines
55which defines parameters, output files and which other tasks with which
56parameters it depends on. For example a teacher task, which runs
57``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
58task which runs a reconstruction and writes out track-wise variables into a root
59file for training. An evaluation/validation task for testing the classifier
60requires both the teacher task, as it needs the weightfile to be present, and
61also a data collection task, because it needs a dataset for testing classifier.
62
63The final task that defines which tasks need to be done for the steering file to
64finish is the ``MainTask``. When you only want to run parts of the
65training/validation pipeline, you can comment out requirements in the Master
66task or replace them by lower-level tasks during debugging.
67
68Requirements
69~~~~~~~~~~~~
70
71This steering file relies on b2luigi_ for task scheduling. It can be installed
72via pip::
73
74 python3 -m pip install [--user] b2luigi
75
76Use the ``--user`` option if you have not rights to install python packages into
77your externals (e.g. because you are using cvmfs) and install them in
78``$HOME/.local`` instead.
79
80Configuration
81~~~~~~~~~~~~~
82
83Instead of command line arguments, the b2luigi script is configured via a
84``settings.json`` file. Open it in your favorite text editor and modify it to
85fit to your requirements.
86
87Usage
88~~~~~
89
90You can test the b2luigi without running it via::
91
92 python3 cdc_and_svd_ckf_merger_mva_training.py --dry-run
93 python3 cdc_and_svd_ckf_merger_mva_training.py --show-output
94
95This will show the outputs and show potential errors in the definitions of the
96luigi task dependencies. To run the the steering file in normal (local) mode,
97run::
98
99 python3 cdc_and_svd_ckf_merger_mva_training.py
100
101One can use the interactive luigi web interface via the central scheduler
102which visualizes the task graph while it is running. Therefore, the scheduler
103daemon ``luigid`` has to run in the background, which is located in
104``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
105example, run::
106
107 luigid --port 8886
108
109Then, execute your steering (e.g. in another terminal) with::
110
111 python3 cdc_and_svd_ckf_merger_mva_training.py --scheduler-port 8886
112
113To view the web interface, open your webbrowser enter into the url bar::
114
115 localhost:8886
116
117If you don't run the steering file on the same machine on which you run your webbrowser, you have two options:
118
119 1. Run both the steering file and ``luigid`` remotely and use
120 ssh-port-forwarding to your local host. Therefore, run on your local
121 machine::
122
123 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
124
125 2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
126 local host>`` argument when calling the steering file
127
128Accessing the results / output files
129~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
130
131All output files are stored in a directory structure in the ``result_path``. The
132directory tree encodes the used b2luigi parameters. This ensures reproducibility
133and makes parameter searches easy. Sometimes, it is hard to find the relevant
134output files. You can view the whole directory structure by running ``tree
135<result_path>``. Ise the unix ``find`` command to find the files that interest
136you, e.g.::
137
138 find <result_path> -name "*.root" # find all ROOT files
139"""
140
141import itertools
142import subprocess
143import basf2_mva
144import basf2
145# from tracking import add_track_finding
146from tracking.path_utils import add_hit_preparation_modules, add_cdc_track_finding, add_svd_standalone_tracking
147from tracking.harvesting_validation.combined_module import CombinedTrackingValidationModule
148import background
149import simulation
150
151from ckf_training import my_basf2_mva_teacher, create_fbdt_option_string
152from tracking_mva_filter_payloads.write_tracking_mva_filter_payloads_to_db import write_tracking_mva_filter_payloads_to_db
153
154# wrap python modules that are used here but not in the externals into a try except block
155install_helpstring_formatter = ("\nCould not find {module} python module.Try installing it via\n"
156 " python3 -m pip install [--user] {module}\n")
157try:
158 import b2luigi
159 from b2luigi.core.utils import create_output_dirs
160 from b2luigi.basf2_helper import Basf2PathTask, Basf2Task
161except ModuleNotFoundError:
162 print(install_helpstring_formatter.format(module="b2luigi"))
163 raise
164
165
166class GenerateSimTask(Basf2PathTask):
167 """
168 Generate simulated Monte Carlo with background overlay.
169
170 Make sure to use different ``random_seed`` parameters for the training data
171 format the classifier trainings and for the test data for the respective
172 evaluation/validation tasks.
173 """
174
175
176 experiment_number = b2luigi.IntParameter()
178 n_events = b2luigi.IntParameter()
181 random_seed = b2luigi.Parameter()
183 bkgfiles_dir = b2luigi.Parameter(
185 hashed=True
186
187 )
188
189 queue = 'l'
191
192 def output_file_name(self, n_events=None, random_seed=None):
193 """
194 Create output file name depending on number of events and production
195 mode that is specified in the random_seed string.
196
197 :param n_events: Number of events to simulate.
198 :param random_seed: Random seed to use for the simulation to create independent samples.
199 """
200 if n_events is None:
201 n_events = self.n_events
202 if random_seed is None:
203 random_seed = self.random_seed
204 return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
205
206 def output(self):
207 """
208 Generate list of output files that the task should produce.
209 The task is considered finished if and only if the outputs all exist.
210 """
211 yield self.add_to_output(self.output_file_name())
212
213 def create_path(self):
214 """
215 Create basf2 path to process with event generation and simulation.
216 """
217 basf2.set_random_seed(self.random_seed)
218 path = basf2.create_path()
219 path.add_module(
220 "EventInfoSetter", evtNumList=[self.n_events], runList=[0], expList=[self.experiment_number]
221 )
222 path.add_module("EvtGenInput")
223 bkg_files = ""
224 # \cond suppress doxygen warning
225 if self.experiment_number == 0:
227 else:
229 # \endcond
230
231 simulation.add_simulation(path, bkgfiles=bkg_files, bkgOverlay=True, usePXDDataReduction=False)
232
233 path.add_module(
234 "RootOutput",
235 outputFileName=self.get_output_file_name(self.output_file_name()),
236 )
237 return path
238
239
240# I don't use the default MergeTask or similar because they only work if every input file is called the same.
241# Additionally, I want to add more features like deleting the original input to save storage space.
242class SplitNMergeSimTask(Basf2Task):
243 """
244 Generate simulated Monte Carlo with background overlay.
245
246 Make sure to use different ``random_seed`` parameters for the training data
247 format the classifier trainings and for the test data for the respective
248 evaluation/validation tasks.
249 """
250
251
252 experiment_number = b2luigi.IntParameter()
254 n_events = b2luigi.IntParameter()
257 random_seed = b2luigi.Parameter()
259 bkgfiles_dir = b2luigi.Parameter(
261 hashed=True
262
263 )
264
265 queue = 'sx'
267
268 def output_file_name(self, n_events=None, random_seed=None):
269 """
270 Create output file name depending on number of events and production
271 mode that is specified in the random_seed string.
272
273 :param n_events: Number of events to simulate.
274 :param random_seed: Random seed to use for the simulation to create independent samples.
275 """
276 if n_events is None:
277 n_events = self.n_events
278 if random_seed is None:
279 random_seed = self.random_seed
280 return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
281
282 def output(self):
283 """
284 Generate list of output files that the task should produce.
285 The task is considered finished if and only if the outputs all exist.
286 """
287 yield self.add_to_output(self.output_file_name())
288
289 def requires(self):
290 """
291 This task requires several GenerateSimTask to be finished so that he required number of events is created.
292 """
293 n_events_per_task = MainTask.n_events_per_task
294 quotient, remainder = divmod(self.n_events, n_events_per_task)
295 for i in range(quotient):
296 yield GenerateSimTask(
297 bkgfiles_dir=self.bkgfiles_dir,
298 num_processes=MainTask.num_processes,
299 random_seed=self.random_seed + '_' + str(i).zfill(3),
300 n_events=n_events_per_task,
301 experiment_number=self.experiment_number,
302 )
303 if remainder > 0:
304 yield GenerateSimTask(
305 bkgfiles_dir=self.bkgfiles_dir,
306 num_processes=MainTask.num_processes,
307 random_seed=self.random_seed + '_' + str(quotient).zfill(3),
308 n_events=remainder,
309 experiment_number=self.experiment_number,
310 )
311
312 @b2luigi.on_temporary_files
313 def process(self):
314 """
315 When all GenerateSimTasks finished, merge the output.
316 """
317 create_output_dirs(self)
318
319 file_list = [item for sublist in self.get_input_file_names().values() for item in sublist]
320 print("Merge the following files:")
321 print(file_list)
322 cmd = ["b2file-merge", "-f"]
323 args = cmd + [self.get_output_file_name(self.output_file_name())] + file_list
324 subprocess.check_call(args)
325 print("Finished merging. Now remove the input files to save space.")
326 cmd2 = ["rm", "-f"]
327 for tempfile in file_list:
328 args = cmd2 + [tempfile]
329 subprocess.check_call(args)
330
331
332class ResultRecordingTask(Basf2PathTask):
333 """
334 Task to record data for the final result filter. This only requires found and MC-matched SVD and CDC tracks that need to be
335 merged, all state filters are set to "all"
336 """
337
338
339 experiment_number = b2luigi.IntParameter()
341 n_events_training = b2luigi.IntParameter()
344 random_seed = b2luigi.Parameter()
346
347 result_filter_records_name = b2luigi.Parameter()
349 def output(self):
350 """
351 Generate list of output files that the task should produce.
352 The task is considered finished if and only if the outputs all exist.
353 """
354 yield self.add_to_output(self.result_filter_records_name)
355
356 def requires(self):
357 """
358 This task requires that the training SplitMergeSimTask is finished.
359 """
360 yield SplitNMergeSimTask(
361 bkgfiles_dir=MainTask.bkgfiles_by_exp[self.experiment_number],
362 random_seed=self.random_seed,
363 n_events=self.n_events_training,
364 experiment_number=self.experiment_number,
365 )
366
367 def create_result_recording_path(self, result_filter_records_name):
368 """
369 Create a path for the recording of the result filter. This file is then used to train the result filter.
370
371 :param result_filter_records_name: Name of the recording file.
372 """
373
374 path = basf2.create_path()
375
376 # get all the file names from the list of input files that are meant for training
377 file_list = [fname for sublist in self.get_input_file_names().values()
378 for fname in sublist if "generated_mc_N" in fname and "training" in fname and fname.endswith(".root")]
379 path.add_module("RootInput", inputFileNames=file_list)
380
381 path.add_module("Gearbox")
382 path.add_module("Geometry")
383 path.add_module("SetupGenfitExtrapolation")
384
385 add_hit_preparation_modules(path, components=["SVD"])
386
387 # MCTrackFinding
388 mc_reco_tracks = "MCRecoTracks"
389 path.add_module('TrackFinderMCTruthRecoTracks',
390 RecoTracksStoreArrayName=mc_reco_tracks)
391
392 # CDC track finding and MC matching
393 cdc_reco_tracks = "CDCRecoTracks"
394 add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
395 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=False, UseCDCHits=True,
396 mcRecoTracksStoreArrayName=mc_reco_tracks,
397 prRecoTracksStoreArrayName=cdc_reco_tracks)
398
399 path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
400
401 # SVD track finding and MC matching
402 svd_reco_tracks = "SVDRecoTracks"
403 add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
404 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=False,
405 mcRecoTracksStoreArrayName=mc_reco_tracks,
406 prRecoTracksStoreArrayName=svd_reco_tracks)
407
408 direction = "backward"
409 path.add_module("CDCToSVDSeedCKF",
410 inputRecoTrackStoreArrayName=cdc_reco_tracks,
411
412 fromRelationStoreArrayName=cdc_reco_tracks,
413 toRelationStoreArrayName=svd_reco_tracks,
414
415 relatedRecoTrackStoreArrayName=svd_reco_tracks,
416 cdcTracksStoreArrayName=cdc_reco_tracks,
417 vxdTracksStoreArrayName=svd_reco_tracks,
418
419 relationCheckForDirection=direction,
420 reverseSeed=False,
421 firstHighFilterParameters={"direction": direction},
422 advanceHighFilterParameters={"direction": direction},
423
424 writeOutDirection=direction,
425 endEarly=False,
426
427 filter="recording_with_relations",
428 filterParameters={"rootFileName": result_filter_records_name})
429
430 return path
431
432 def create_path(self):
433 """
434 Create basf2 path to process with event generation and simulation.
435 """
437 result_filter_records_name=self.get_output_file_name(self.result_filter_records_name),
438 )
439
440
441class CKFResultFilterTeacherTask(Basf2Task):
442 """
443 A teacher task runs the basf2 mva teacher on the training data provided by a
444 data collection task.
445
446 Since teacher tasks are needed for all quality estimators covered by this
447 steering file and the only thing that changes is the required data
448 collection task and some training parameters, I decided to use inheritance
449 and have the basic functionality in this base class/interface and have the
450 specific teacher tasks inherit from it.
451 """
452
453 experiment_number = b2luigi.IntParameter()
455 n_events_training = b2luigi.IntParameter()
458 random_seed = b2luigi.Parameter()
460 result_filter_records_name = b2luigi.Parameter()
462 training_target = b2luigi.Parameter(
464 default="truth"
465
466 )
467
469 exclude_variables = b2luigi.ListParameter(
471 hashed=True, default=[]
472
473 )
474
475 fast_bdt_option = b2luigi.ListParameter(
477 hashed=True, default=[200, 8, 3, 0.1]
478
479 )
480
481 def get_weightfile_identifier(self, fast_bdt_option=None):
482 """
483 Name of weightfile that is created by the teacher task.
484
485 :param fast_bdt_option: FastBDT option that is used to train this MVA
486 """
487 if fast_bdt_option is None:
488 fast_bdt_option = self.fast_bdt_option
489 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
490 weightfile_name = "trk_CDCToSVDSeedResultFilter" + fast_bdt_string
491 return weightfile_name
492
493 def requires(self):
494 """
495 Generate list of luigi Tasks that this Task depends on.
496 """
498 experiment_number=self.experiment_number,
499 n_events_training=self.n_events_training,
500 result_filter_records_name=self.result_filter_records_name,
501 random_seed=self.random_seed
502 )
503
504 def output(self):
505 """
506 Generate list of output files that the task should produce.
507 The task is considered finished if and only if the outputs all exist.
508 """
509 yield self.add_to_output(self.get_weightfile_identifier() + ".root")
510
511 def process(self):
512 """
513 Use basf2_mva teacher to create MVA weightfile from collected training
514 data variables.
515
516 This is the main process that is dispatched by the ``run`` method that
517 is inherited from ``Basf2Task``.
518 """
519 records_files = self.get_input_file_names(self.result_filter_records_name)
520 weightfile_identifier = self.get_weightfile_identifier()
521 my_basf2_mva_teacher(
522 records_files=records_files,
523 tree_name="records",
524 weightfile_identifier=weightfile_identifier,
525 target_variable=self.training_target,
526 exclude_variables=self.exclude_variables,
527 fast_bdt_option=self.fast_bdt_option,
528 )
529 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier + ".root"))
530
531
532class ValidationAndOptimisationTask(Basf2PathTask):
534 """
535 Validate the performance of the trained filters by trying various combinations of FastBDT options, as well as cut values for
536 the states, the number of best candidates kept after each filter, and similar for the result filter.
537 """
538
539 experiment_number = b2luigi.IntParameter()
541 n_events_training = b2luigi.IntParameter()
543 fast_bdt_option = b2luigi.ListParameter(
544 # ## \cond
545 hashed=True, default=[200, 8, 3, 0.1]
546 # ## \endcond
547 )
548
549 n_events_testing = b2luigi.IntParameter()
551 result_filter_cut = b2luigi.FloatParameter()
553 # prepend the testing payloads
554 basf2.conditions.prepend_testing_payloads("localdb/database.txt")
555
556 def output(self):
557 """
558 Generate list of output files that the task should produce.
559 The task is considered finished if and only if the outputs all exist.
560 """
561 fbdt_string = create_fbdt_option_string(self.fast_bdt_option)
562 yield self.add_to_output(
563 f"cdc_svd_merger_ckf_validation{fbdt_string}_{self.result_filter_cut}.root")
564
565 def requires(self):
566 """
567 This task requires trained result filters, and that an independent data set for validation was created using the
568 ``SplitMergeSimTask`` with the random seed optimisation.
569 """
571 result_filter_records_name="filter_records.root",
572 experiment_number=self.experiment_number,
573 n_events_training=self.n_events_training,
574 fast_bdt_option=self.fast_bdt_option,
575 random_seed='training'
576 )
577 yield SplitNMergeSimTask(
578 bkgfiles_dir=MainTask.bkgfiles_by_exp[self.experiment_number],
579 experiment_number=self.experiment_number,
580 n_events=self.n_events_testing,
581 random_seed="optimisation",
582 )
583
585 """
586 Create a path to validate the trained filters.
587 """
588 path = basf2.create_path()
589
590 # get all the file names from the list of input files that are meant for optimisation / validation
591 file_list = [fname for sublist in self.get_input_file_names().values()
592 for fname in sublist if "generated_mc_N" in fname and "optimisation" in fname and fname.endswith(".root")]
593 path.add_module("RootInput", inputFileNames=file_list)
594
595 path.add_module("Gearbox")
596 path.add_module("Geometry")
597 path.add_module("SetupGenfitExtrapolation")
598
599 add_hit_preparation_modules(path, components=["SVD"])
600
601 cdc_reco_tracks = "CDCRecoTracks"
602 svd_reco_tracks = "SVDRecoTracks"
603 reco_tracks = "RecoTracks"
604 mc_reco_tracks = "MCRecoTracks"
605
606 # CDC track finding and MC matching
607 add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
608
609 path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
610
611 # SVD track finding and MC matching
612 add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
613
614 direction = "backward"
615 fbdt_string = create_fbdt_option_string(self.fast_bdt_option)
616
617 # write the tracking MVA filter parameters and the cut on MVA classifier to be applied on a local db
618 iov = [0, 0, 0, -1]
620 f"trk_CDCToSVDSeedResultFilterParameter{fbdt_string}",
621 iov,
622 f"trk_CDCToSVDSeedResultFilter{fbdt_string}",
624
625 basf2.conditions.prepend_testing_payloads("localdb/database.txt")
626 result_filter_parameters = {"DBPayloadName": f"trk_CDCToSVDSeedResultFilterParameter{fbdt_string}"}
627
628 path.add_module(
629 "CDCToSVDSeedCKF",
630 inputRecoTrackStoreArrayName=cdc_reco_tracks,
631 fromRelationStoreArrayName=cdc_reco_tracks,
632 toRelationStoreArrayName=svd_reco_tracks,
633 relatedRecoTrackStoreArrayName=svd_reco_tracks,
634 cdcTracksStoreArrayName=cdc_reco_tracks,
635 vxdTracksStoreArrayName=svd_reco_tracks,
636 relationCheckForDirection=direction,
637 reverseSeed=False,
638 firstHighFilterParameters={
639 "direction": direction},
640 advanceHighFilterParameters={
641 "direction": direction},
642 writeOutDirection=direction,
643 endEarly=False,
644 filter='mva_with_relations',
645 filterParameters=result_filter_parameters
646 )
647
648 path.add_module('RelatedTracksCombiner',
649 VXDRecoTracksStoreArrayName=svd_reco_tracks,
650 CDCRecoTracksStoreArrayName=cdc_reco_tracks,
651 recoTracksStoreArrayName=reco_tracks)
652
653 path.add_module('TrackFinderMCTruthRecoTracks',
654 RecoTracksStoreArrayName=mc_reco_tracks,
655 WhichParticles=[],
656 UsePXDHits=True,
657 UseSVDHits=True,
658 UseCDCHits=True)
659
660 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=True,
661 mcRecoTracksStoreArrayName=mc_reco_tracks,
662 prRecoTracksStoreArrayName=reco_tracks)
663
664 path.add_module(
666 output_file_name=self.get_output_file_name(
667 f"cdc_svd_merger_ckf_validation{fbdt_string}_{self.result_filter_cut}.root"),
668 reco_tracks_name=reco_tracks,
669 mc_reco_tracks_name=mc_reco_tracks,
670 name="",
671 contact="",
672 expert_level=200))
673
674 return path
675
676 def create_path(self):
677 """
678 Create basf2 path to process with event generation and simulation.
679 """
681
682
683class MainTask(b2luigi.WrapperTask):
684 """
685 Wrapper task that needs to finish for b2luigi to finish running this steering file.
686
687 It is done if the outputs of all required subtasks exist. It is thus at the
688 top of the luigi task graph. Edit the ``requires`` method to steer which
689 tasks and with which parameters you want to run.
690 """
691
692 n_events_training = b2luigi.get_setting(
694 "n_events_training", default=1000
695
696 )
697
698 n_events_testing = b2luigi.get_setting(
700 "n_events_testing", default=500
701
702 )
703
704 n_events_per_task = b2luigi.get_setting(
706 "n_events_per_task", default=100
707
708 )
709
710 num_processes = b2luigi.get_setting(
712 "basf2_processes_per_worker", default=0
713
714 )
715
716
717 bkgfiles_by_exp = b2luigi.get_setting("bkgfiles_by_exp")
719 bkgfiles_by_exp = {int(key): val for (key, val) in bkgfiles_by_exp.items()}
721 def requires(self):
722 """
723 Generate list of tasks that needs to be done for luigi to finish running
724 this steering file.
725 """
726
727 fast_bdt_options = [
728 [50, 8, 3, 0.1],
729 [100, 8, 3, 0.1],
730 [200, 8, 3, 0.1],
731 ]
732 cut_values = []
733 for i in range(4):
734 cut_values.append((i+1) * 0.2)
735
736 experiment_numbers = b2luigi.get_setting("experiment_numbers")
737
738 # iterate over all possible combinations of parameters from the above defined parameter lists
739 for experiment_number, fast_bdt_option, cut_value in itertools.product(
740 experiment_numbers, fast_bdt_options, cut_values
741 ):
743 experiment_number=experiment_number,
744 n_events_training=self.n_events_training,
745 fast_bdt_option=fast_bdt_option,
746 n_events_testing=self.n_events_testing,
747 result_filter_cut=cut_value,
748 )
749
750
751if __name__ == "__main__":
752
753 b2luigi.set_setting("env_script", "./setup_basf2.sh")
754 b2luigi.get_setting("batch_system", default="lsf")
755 workers = b2luigi.get_setting("workers", default=1)
756 b2luigi.process(MainTask(), workers=workers, batch=True)
757
def get_background_files(folder=None, output_file_info=True)
Definition: background.py:17
b2luigi fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
b2luigi bkgfiles_dir
Directory with overlay background root files.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi n_events_testing
Number of events to generate for the test data set.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi result_filter_records_name
Name of the records file for training the final result filter.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
b2luigi bkgfiles_dir
Directory with overlay background root files.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi fast_bdt_option
FastBDT option to use to train the StateFilters.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
b2luigi result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
def add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False)
Definition: simulation.py:126