Belle II Software release-09-00-00
cdc_and_svd_ckf_merger_mva_training.py
1
8
9"""
10cdc_and_svd_ckf_merger_mva_training
11-----------------------------------------
12
13Purpose of this script
14~~~~~~~~~~~~~~~~~~~~~~
15
16This python script is used for the training and validation of the classifier of
17the MVA-based result filter of the CDCToSVDSeedCKF, which combines tracks that
18were found by the CDC and SVD standalone tracking algorithms.
19
20To avoid mistakes, b2luigi is used to create a task chain for a combined training and
21validation of all classifiers.
22
23The order of the b2luigi tasks in this script is as follows (top to bottom):
24* Two tasks to create input samples for training and testing (``GenerateSimTask`` and
25``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
26generated and a number of events per task to reduce runtime. It then divides the total
27number of events by the number of events per task and creates as ``GenerateSimTask`` as
28needed, each with a specific random seed, so that in the end the total number of
29training and testing events are simulated. The individual files are then combined
30by the SplitNMergeSimTask into one file each for training and testing.
31* The ``ResultRecordingTask`` writes out the data used for training of the MVA.
32* The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
33given set of FastBDT options.
34* The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
35provided to run the tracking chain with the weight file under test, and also
36runs the tracking validation.
37* Finally, the ``MainTask`` is the "brain" of the script. It invokes the
38``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
39and cut values on the MVA classifier output.
40
41Due to the dependencies, the calls of the task are reversed. The MainTask
42calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
43values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
44training, and simulation tasks.
45
46b2luigi: Understanding the steering file
47~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48
49All trainings and validations are done in the correct order in this steering
50file. For the purpose of creating a dependency graph, the `b2luigi
51<https://b2luigi.readthedocs.io>`_ python package is used, which extends the
52`luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
53
54Each task that has to be done is represented by a special class, which defines
55which defines parameters, output files and which other tasks with which
56parameters it depends on. For example a teacher task, which runs
57``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
58task which runs a reconstruction and writes out track-wise variables into a root
59file for training. An evaluation/validation task for testing the classifier
60requires both the teacher task, as it needs the weightfile to be present, and
61also a data collection task, because it needs a dataset for testing classifier.
62
63The final task that defines which tasks need to be done for the steering file to
64finish is the ``MainTask``. When you only want to run parts of the
65training/validation pipeline, you can comment out requirements in the Master
66task or replace them by lower-level tasks during debugging.
67
68Requirements
69~~~~~~~~~~~~
70
71This steering file relies on b2luigi_ for task scheduling. It can be installed
72via pip::
73
74 python3 -m pip install [--user] b2luigi
75
76Use the ``--user`` option if you have not rights to install python packages into
77your externals (e.g. because you are using cvmfs) and install them in
78``$HOME/.local`` instead.
79
80Configuration
81~~~~~~~~~~~~~
82
83Instead of command line arguments, the b2luigi script is configured via a
84``settings.json`` file. Open it in your favorite text editor and modify it to
85fit to your requirements.
86
87Usage
88~~~~~
89
90You can test the b2luigi without running it via::
91
92 python3 cdc_and_svd_ckf_merger_mva_training.py --dry-run
93 python3 cdc_and_svd_ckf_merger_mva_training.py --show-output
94
95This will show the outputs and show potential errors in the definitions of the
96luigi task dependencies. To run the the steering file in normal (local) mode,
97run::
98
99 python3 cdc_and_svd_ckf_merger_mva_training.py
100
101One can use the interactive luigi web interface via the central scheduler
102which visualizes the task graph while it is running. Therefore, the scheduler
103daemon ``luigid`` has to run in the background, which is located in
104``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
105example, run::
106
107 luigid --port 8886
108
109Then, execute your steering (e.g. in another terminal) with::
110
111 python3 cdc_and_svd_ckf_merger_mva_training.py --scheduler-port 8886
112
113To view the web interface, open your webbrowser enter into the url bar::
114
115 localhost:8886
116
117If you don't run the steering file on the same machine on which you run your webbrowser, you have two options:
118
119 1. Run both the steering file and ``luigid`` remotely and use
120 ssh-port-forwarding to your local host. Therefore, run on your local
121 machine::
122
123 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
124
125 2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
126 local host>`` argument when calling the steering file
127
128Accessing the results / output files
129~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
130
131All output files are stored in a directory structure in the ``result_path``. The
132directory tree encodes the used b2luigi parameters. This ensures reproducibility
133and makes parameter searches easy. Sometimes, it is hard to find the relevant
134output files. You can view the whole directory structure by running ``tree
135<result_path>``. Ise the unix ``find`` command to find the files that interest
136you, e.g.::
137
138 find <result_path> -name "*.root" # find all ROOT files
139"""
140
141import itertools
142import subprocess
143
144import basf2
145# from tracking import add_track_finding
146from tracking.path_utils import add_hit_preparation_modules, add_cdc_track_finding, add_svd_standalone_tracking
147from tracking.harvesting_validation.combined_module import CombinedTrackingValidationModule
148import background
149import simulation
150
151from ckf_training import my_basf2_mva_teacher, create_fbdt_option_string
152
153# wrap python modules that are used here but not in the externals into a try except block
154install_helpstring_formatter = ("\nCould not find {module} python module.Try installing it via\n"
155 " python3 -m pip install [--user] {module}\n")
156try:
157 import b2luigi
158 from b2luigi.core.utils import create_output_dirs
159 from b2luigi.basf2_helper import Basf2PathTask, Basf2Task
160except ModuleNotFoundError:
161 print(install_helpstring_formatter.format(module="b2luigi"))
162 raise
163
164
165class GenerateSimTask(Basf2PathTask):
166 """
167 Generate simulated Monte Carlo with background overlay.
168
169 Make sure to use different ``random_seed`` parameters for the training data
170 format the classifier trainings and for the test data for the respective
171 evaluation/validation tasks.
172 """
173
174
175 experiment_number = b2luigi.IntParameter()
177 n_events = b2luigi.IntParameter()
180 random_seed = b2luigi.Parameter()
182 bkgfiles_dir = b2luigi.Parameter(
184 hashed=True
185
186 )
187
188 queue = 'l'
190
191 def output_file_name(self, n_events=None, random_seed=None):
192 """
193 Create output file name depending on number of events and production
194 mode that is specified in the random_seed string.
195
196 :param n_events: Number of events to simulate.
197 :param random_seed: Random seed to use for the simulation to create independent samples.
198 """
199 if n_events is None:
200 n_events = self.n_events
201 if random_seed is None:
202 random_seed = self.random_seed
203 return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
204
205 def output(self):
206 """
207 Generate list of output files that the task should produce.
208 The task is considered finished if and only if the outputs all exist.
209 """
210 yield self.add_to_output(self.output_file_name())
211
212 def create_path(self):
213 """
214 Create basf2 path to process with event generation and simulation.
215 """
216 basf2.set_random_seed(self.random_seed)
217 path = basf2.create_path()
218 path.add_module(
219 "EventInfoSetter", evtNumList=[self.n_events], runList=[0], expList=[self.experiment_number]
220 )
221 path.add_module("EvtGenInput")
222 bkg_files = ""
223 # \cond suppress doxygen warning
224 if self.experiment_number == 0:
226 else:
228 # \endcond
229
230 simulation.add_simulation(path, bkgfiles=bkg_files, bkgOverlay=True, usePXDDataReduction=False)
231
232 path.add_module(
233 "RootOutput",
234 outputFileName=self.get_output_file_name(self.output_file_name()),
235 )
236 return path
237
238
239# I don't use the default MergeTask or similar because they only work if every input file is called the same.
240# Additionally, I want to add more features like deleting the original input to save storage space.
241class SplitNMergeSimTask(Basf2Task):
242 """
243 Generate simulated Monte Carlo with background overlay.
244
245 Make sure to use different ``random_seed`` parameters for the training data
246 format the classifier trainings and for the test data for the respective
247 evaluation/validation tasks.
248 """
249
250
251 experiment_number = b2luigi.IntParameter()
253 n_events = b2luigi.IntParameter()
256 random_seed = b2luigi.Parameter()
258 bkgfiles_dir = b2luigi.Parameter(
260 hashed=True
261
262 )
263
264 queue = 'sx'
266
267 def output_file_name(self, n_events=None, random_seed=None):
268 """
269 Create output file name depending on number of events and production
270 mode that is specified in the random_seed string.
271
272 :param n_events: Number of events to simulate.
273 :param random_seed: Random seed to use for the simulation to create independent samples.
274 """
275 if n_events is None:
276 n_events = self.n_events
277 if random_seed is None:
278 random_seed = self.random_seed
279 return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
280
281 def output(self):
282 """
283 Generate list of output files that the task should produce.
284 The task is considered finished if and only if the outputs all exist.
285 """
286 yield self.add_to_output(self.output_file_name())
287
288 def requires(self):
289 """
290 This task requires several GenerateSimTask to be finished so that he required number of events is created.
291 """
292 n_events_per_task = MainTask.n_events_per_task
293 quotient, remainder = divmod(self.n_events, n_events_per_task)
294 for i in range(quotient):
295 yield GenerateSimTask(
296 bkgfiles_dir=self.bkgfiles_dir,
297 num_processes=MainTask.num_processes,
298 random_seed=self.random_seed + '_' + str(i).zfill(3),
299 n_events=n_events_per_task,
300 experiment_number=self.experiment_number,
301 )
302 if remainder > 0:
303 yield GenerateSimTask(
304 bkgfiles_dir=self.bkgfiles_dir,
305 num_processes=MainTask.num_processes,
306 random_seed=self.random_seed + '_' + str(quotient).zfill(3),
307 n_events=remainder,
308 experiment_number=self.experiment_number,
309 )
310
311 @b2luigi.on_temporary_files
312 def process(self):
313 """
314 When all GenerateSimTasks finished, merge the output.
315 """
316 create_output_dirs(self)
317
318 file_list = [item for sublist in self.get_input_file_names().values() for item in sublist]
319 print("Merge the following files:")
320 print(file_list)
321 cmd = ["b2file-merge", "-f"]
322 args = cmd + [self.get_output_file_name(self.output_file_name())] + file_list
323 subprocess.check_call(args)
324 print("Finished merging. Now remove the input files to save space.")
325 cmd2 = ["rm", "-f"]
326 for tempfile in file_list:
327 args = cmd2 + [tempfile]
328 subprocess.check_call(args)
329
330
331class ResultRecordingTask(Basf2PathTask):
332 """
333 Task to record data for the final result filter. This only requires found and MC-matched SVD and CDC tracks that need to be
334 merged, all state filters are set to "all"
335 """
336
337
338 experiment_number = b2luigi.IntParameter()
340 n_events_training = b2luigi.IntParameter()
343 random_seed = b2luigi.Parameter()
345
346 result_filter_records_name = b2luigi.Parameter()
348 def output(self):
349 """
350 Generate list of output files that the task should produce.
351 The task is considered finished if and only if the outputs all exist.
352 """
353 yield self.add_to_output(self.result_filter_records_name)
354
355 def requires(self):
356 """
357 This task requires that the training SplitMergeSimTask is finished.
358 """
359 yield SplitNMergeSimTask(
360 bkgfiles_dir=MainTask.bkgfiles_by_exp[self.experiment_number],
361 random_seed=self.random_seed,
362 n_events=self.n_events_training,
363 experiment_number=self.experiment_number,
364 )
365
366 def create_result_recording_path(self, result_filter_records_name):
367 """
368 Create a path for the recording of the result filter. This file is then used to train the result filter.
369
370 :param result_filter_records_name: Name of the recording file.
371 """
372
373 path = basf2.create_path()
374
375 # get all the file names from the list of input files that are meant for training
376 file_list = [fname for sublist in self.get_input_file_names().values()
377 for fname in sublist if "generated_mc_N" in fname and "training" in fname and fname.endswith(".root")]
378 path.add_module("RootInput", inputFileNames=file_list)
379
380 path.add_module("Gearbox")
381 path.add_module("Geometry")
382 path.add_module("SetupGenfitExtrapolation")
383
384 add_hit_preparation_modules(path, components=["SVD"])
385
386 # MCTrackFinding
387 mc_reco_tracks = "MCRecoTracks"
388 path.add_module('TrackFinderMCTruthRecoTracks',
389 RecoTracksStoreArrayName=mc_reco_tracks)
390
391 # CDC track finding and MC matching
392 cdc_reco_tracks = "CDCRecoTracks"
393 add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
394 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=False, UseCDCHits=True,
395 mcRecoTracksStoreArrayName=mc_reco_tracks,
396 prRecoTracksStoreArrayName=cdc_reco_tracks)
397
398 path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
399
400 # SVD track finding and MC matching
401 svd_reco_tracks = "SVDRecoTracks"
402 add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
403 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=False,
404 mcRecoTracksStoreArrayName=mc_reco_tracks,
405 prRecoTracksStoreArrayName=svd_reco_tracks)
406
407 direction = "backward"
408 path.add_module("CDCToSVDSeedCKF",
409 inputRecoTrackStoreArrayName=cdc_reco_tracks,
410
411 fromRelationStoreArrayName=cdc_reco_tracks,
412 toRelationStoreArrayName=svd_reco_tracks,
413
414 relatedRecoTrackStoreArrayName=svd_reco_tracks,
415 cdcTracksStoreArrayName=cdc_reco_tracks,
416 vxdTracksStoreArrayName=svd_reco_tracks,
417
418 relationCheckForDirection=direction,
419 reverseSeed=False,
420 firstHighFilterParameters={"direction": direction},
421 advanceHighFilterParameters={"direction": direction},
422
423 writeOutDirection=direction,
424 endEarly=False,
425
426 filter="recording_with_relations",
427 filterParameters={"rootFileName": result_filter_records_name})
428
429 return path
430
431 def create_path(self):
432 """
433 Create basf2 path to process with event generation and simulation.
434 """
436 result_filter_records_name=self.get_output_file_name(self.result_filter_records_name),
437 )
438
439
440class CKFResultFilterTeacherTask(Basf2Task):
441 """
442 A teacher task runs the basf2 mva teacher on the training data provided by a
443 data collection task.
444
445 Since teacher tasks are needed for all quality estimators covered by this
446 steering file and the only thing that changes is the required data
447 collection task and some training parameters, I decided to use inheritance
448 and have the basic functionality in this base class/interface and have the
449 specific teacher tasks inherit from it.
450 """
451
452 experiment_number = b2luigi.IntParameter()
454 n_events_training = b2luigi.IntParameter()
457 random_seed = b2luigi.Parameter()
459 result_filter_records_name = b2luigi.Parameter()
461 training_target = b2luigi.Parameter(
463 default="truth"
464
465 )
466
468 exclude_variables = b2luigi.ListParameter(
470 hashed=True, default=[]
471
472 )
473
474 fast_bdt_option = b2luigi.ListParameter(
476 hashed=True, default=[200, 8, 3, 0.1]
477
478 )
479
480 def get_weightfile_xml_identifier(self, fast_bdt_option=None):
481 """
482 Name of the xml weightfile that is created by the teacher task.
483 It is subsequently used as a local weightfile in the following validation tasks.
484
485 :param fast_bdt_option: FastBDT option that is used to train this MVA
486 """
487 if fast_bdt_option is None:
488 fast_bdt_option = self.fast_bdt_option
489 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
490 weightfile_name = "trk_CDCToSVDSeedResultFilter" + fast_bdt_string
491 return weightfile_name + ".xml"
492
493 def requires(self):
494 """
495 Generate list of luigi Tasks that this Task depends on.
496 """
498 experiment_number=self.experiment_number,
499 n_events_training=self.n_events_training,
500 result_filter_records_name=self.result_filter_records_name,
501 random_seed=self.random_seed
502 )
503
504 def output(self):
505 """
506 Generate list of output files that the task should produce.
507 The task is considered finished if and only if the outputs all exist.
508 """
509 yield self.add_to_output(self.get_weightfile_xml_identifier())
510
511 def process(self):
512 """
513 Use basf2_mva teacher to create MVA weightfile from collected training
514 data variables.
515
516 This is the main process that is dispatched by the ``run`` method that
517 is inherited from ``Basf2Task``.
518 """
519 records_files = self.get_input_file_names(self.result_filter_records_name)
520
521 my_basf2_mva_teacher(
522 records_files=records_files,
523 tree_name="records",
524 weightfile_identifier=self.get_output_file_name(self.get_weightfile_xml_identifier()),
525 target_variable=self.training_target,
526 exclude_variables=self.exclude_variables,
527 fast_bdt_option=self.fast_bdt_option,
528 )
529
530
531class ValidationAndOptimisationTask(Basf2PathTask):
532 """
533 Validate the performance of the trained filters by trying various combinations of FastBDT options, as well as cut values for
534 the states, the number of best candidates kept after each filter, and similar for the result filter.
535 """
536
537 experiment_number = b2luigi.IntParameter()
539 n_events_training = b2luigi.IntParameter()
541 fast_bdt_option = b2luigi.ListParameter(
542 # ## \cond
543 hashed=True, default=[200, 8, 3, 0.1]
544 # ## \endcond
545 )
546
547 n_events_testing = b2luigi.IntParameter()
549 result_filter_cut = b2luigi.FloatParameter()
551 def output(self):
552 """
553 Generate list of output files that the task should produce.
554 The task is considered finished if and only if the outputs all exist.
555 """
556 fbdt_string = create_fbdt_option_string(self.fast_bdt_option)
557 yield self.add_to_output(
558 f"cdc_svd_merger_ckf_validation{fbdt_string}_{self.result_filter_cut}.root")
559
560 def requires(self):
561 """
562 This task requires trained result filters, and that an independent data set for validation was created using the
563 ``SplitMergeSimTask`` with the random seed optimisation.
564 """
566 result_filter_records_name="filter_records.root",
567 experiment_number=self.experiment_number,
568 n_events_training=self.n_events_training,
569 fast_bdt_option=self.fast_bdt_option,
570 random_seed='training'
571 )
572 yield SplitNMergeSimTask(
573 bkgfiles_dir=MainTask.bkgfiles_by_exp[self.experiment_number],
574 experiment_number=self.experiment_number,
575 n_events=self.n_events_testing,
576 random_seed="optimisation",
577 )
578
580 """
581 Create a path to validate the trained filters.
582 """
583 path = basf2.create_path()
584
585 # get all the file names from the list of input files that are meant for optimisation / validation
586 file_list = [fname for sublist in self.get_input_file_names().values()
587 for fname in sublist if "generated_mc_N" in fname and "optimisation" in fname and fname.endswith(".root")]
588 path.add_module("RootInput", inputFileNames=file_list)
589
590 path.add_module("Gearbox")
591 path.add_module("Geometry")
592 path.add_module("SetupGenfitExtrapolation")
593
594 add_hit_preparation_modules(path, components=["SVD"])
595
596 cdc_reco_tracks = "CDCRecoTracks"
597 svd_reco_tracks = "SVDRecoTracks"
598 reco_tracks = "RecoTracks"
599 mc_reco_tracks = "MCRecoTracks"
600
601 # CDC track finding and MC matching
602 add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
603
604 path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
605
606 # SVD track finding and MC matching
607 add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
608
609 direction = "backward"
610 fbdt_string = create_fbdt_option_string(self.fast_bdt_option)
611 path.add_module(
612 "CDCToSVDSeedCKF",
613 inputRecoTrackStoreArrayName=cdc_reco_tracks,
614 fromRelationStoreArrayName=cdc_reco_tracks,
615 toRelationStoreArrayName=svd_reco_tracks,
616 relatedRecoTrackStoreArrayName=svd_reco_tracks,
617 cdcTracksStoreArrayName=cdc_reco_tracks,
618 vxdTracksStoreArrayName=svd_reco_tracks,
619 relationCheckForDirection=direction,
620 reverseSeed=False,
621 firstHighFilterParameters={
622 "direction": direction},
623 advanceHighFilterParameters={
624 "direction": direction},
625 writeOutDirection=direction,
626 endEarly=False,
627 filter='mva_with_relations',
628 filterParameters={
629 "identifier": self.get_input_file_names(f"trk_CDCToSVDSeedResultFilter{fbdt_string}.xml")[0],
630 "cut": self.result_filter_cut})
631
632 path.add_module('RelatedTracksCombiner',
633 VXDRecoTracksStoreArrayName=svd_reco_tracks,
634 CDCRecoTracksStoreArrayName=cdc_reco_tracks,
635 recoTracksStoreArrayName=reco_tracks)
636
637 path.add_module('TrackFinderMCTruthRecoTracks',
638 RecoTracksStoreArrayName=mc_reco_tracks,
639 WhichParticles=[],
640 UsePXDHits=True,
641 UseSVDHits=True,
642 UseCDCHits=True)
643
644 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=True,
645 mcRecoTracksStoreArrayName=mc_reco_tracks,
646 prRecoTracksStoreArrayName=reco_tracks)
647
648 path.add_module(
650 output_file_name=self.get_output_file_name(
651 f"cdc_svd_merger_ckf_validation{fbdt_string}_{self.result_filter_cut}.root"),
652 reco_tracks_name=reco_tracks,
653 mc_reco_tracks_name=mc_reco_tracks,
654 name="",
655 contact="",
656 expert_level=200))
657
658 return path
659
660 def create_path(self):
661 """
662 Create basf2 path to process with event generation and simulation.
663 """
665
666
667class MainTask(b2luigi.WrapperTask):
668 """
669 Wrapper task that needs to finish for b2luigi to finish running this steering file.
670
671 It is done if the outputs of all required subtasks exist. It is thus at the
672 top of the luigi task graph. Edit the ``requires`` method to steer which
673 tasks and with which parameters you want to run.
674 """
675
676 n_events_training = b2luigi.get_setting(
678 "n_events_training", default=1000
679
680 )
681
682 n_events_testing = b2luigi.get_setting(
684 "n_events_testing", default=500
685
686 )
687
688 n_events_per_task = b2luigi.get_setting(
690 "n_events_per_task", default=100
691
692 )
693
694 num_processes = b2luigi.get_setting(
696 "basf2_processes_per_worker", default=0
697
698 )
699
700
701 bkgfiles_by_exp = b2luigi.get_setting("bkgfiles_by_exp")
703 bkgfiles_by_exp = {int(key): val for (key, val) in bkgfiles_by_exp.items()}
705 def requires(self):
706 """
707 Generate list of tasks that needs to be done for luigi to finish running
708 this steering file.
709 """
710
711 fast_bdt_options = [
712 [50, 8, 3, 0.1],
713 [100, 8, 3, 0.1],
714 [200, 8, 3, 0.1],
715 ]
716 cut_values = []
717 for i in range(4):
718 cut_values.append((i+1) * 0.2)
719
720 experiment_numbers = b2luigi.get_setting("experiment_numbers")
721
722 # iterate over all possible combinations of parameters from the above defined parameter lists
723 for experiment_number, fast_bdt_option, cut_value in itertools.product(
724 experiment_numbers, fast_bdt_options, cut_values
725 ):
727 experiment_number=experiment_number,
728 n_events_training=self.n_events_training,
729 fast_bdt_option=fast_bdt_option,
730 n_events_testing=self.n_events_testing,
731 result_filter_cut=cut_value,
732 )
733
734
735if __name__ == "__main__":
736 b2luigi.set_setting("env_script", "./setup_basf2.sh")
737 b2luigi.set_setting("batch_system", "htcondor")
738 workers = b2luigi.get_setting("workers", default=1)
739 b2luigi.process(MainTask(), workers=workers, batch=True)
740
def get_background_files(folder=None, output_file_info=True)
Definition: background.py:17
b2luigi fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
b2luigi bkgfiles_dir
Directory with overlay background root files.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi n_events_testing
Number of events to generate for the test data set.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi result_filter_records_name
Name of the records file for training the final result filter.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
b2luigi bkgfiles_dir
Directory with overlay background root files.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
b2luigi fast_bdt_option
FastBDT option to use to train the StateFilters.
b2luigi n_events_training
Number of events to generate for the training data set.
b2luigi n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
b2luigi result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
b2luigi experiment_number
Experiment number of the conditions database, e.g.
def add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False)
Definition: simulation.py:126