Belle II Software development
cdc_and_svd_ckf_merger_mva_training.py
1
8
9"""
10cdc_and_svd_ckf_merger_mva_training
11-----------------------------------------
12
13Purpose of this script
14~~~~~~~~~~~~~~~~~~~~~~
15
16This python script is used for the training and validation of the classifier of
17the MVA-based result filter of the CDCToSVDSeedCKF, which combines tracks that
18were found by the CDC and SVD standalone tracking algorithms.
19
20To avoid mistakes, b2luigi is used to create a task chain for a combined training and
21validation of all classifiers.
22
23The order of the b2luigi tasks in this script is as follows (top to bottom):
24* Two tasks to create input samples for training and testing (``GenerateSimTask`` and
25``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
26generated and a number of events per task to reduce runtime. It then divides the total
27number of events by the number of events per task and creates as ``GenerateSimTask`` as
28needed, each with a specific random seed, so that in the end the total number of
29training and testing events are simulated. The individual files are then combined
30by the SplitNMergeSimTask into one file each for training and testing.
31* The ``ResultRecordingTask`` writes out the data used for training of the MVA.
32* The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
33given set of FastBDT options.
34* The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
35provided to run the tracking chain with the weight file under test, and also
36runs the tracking validation.
37* Finally, the ``SummaryTask`` is the "brain" of the script. It invokes the
38``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
39and cut values on the MVA classifier output.
40
41Due to the dependencies, the calls of the task are reversed. The SummaryTask
42calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
43values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
44training, and simulation tasks.
45
46b2luigi: Understanding the steering file
47~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48
49All trainings and validations are done in the correct order in this steering
50file. For the purpose of creating a dependency graph, the `b2luigi
51<https://b2luigi.readthedocs.io>`_ python package is used, which extends the
52`luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
53
54Each task that has to be done is represented by a special class, which defines
55which defines parameters, output files and which other tasks with which
56parameters it depends on. For example a teacher task, which runs
57``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
58task which runs a reconstruction and writes out track-wise variables into a root
59file for training. An evaluation/validation task for testing the classifier
60requires both the teacher task, as it needs the weightfile to be present, and
61also a data collection task, because it needs a dataset for testing classifier.
62
63The final task that defines which tasks need to be done for the steering file to
64finish is the ``SummaryTask``. When you only want to run parts of the
65training/validation pipeline, you can comment out requirements in the Master
66task or replace them by lower-level tasks during debugging.
67
68Requirements
69~~~~~~~~~~~~
70
71This steering file relies on b2luigi_ for task scheduling. It can be installed
72via pip::
73
74 python3 -m pip install [--user] b2luigi
75
76Use the ``--user`` option if you have not rights to install python packages into
77your externals (e.g. because you are using cvmfs) and install them in
78``$HOME/.local`` instead.
79
80Configuration
81~~~~~~~~~~~~~
82
83Instead of command line arguments, the b2luigi script is configured via a
84``settings.json`` file. Open it in your favorite text editor and modify it to
85fit to your requirements.
86
87Usage
88~~~~~
89
90You can test the b2luigi without running it via::
91
92 python3 cdc_and_svd_ckf_merger_mva_training.py --dry-run
93 python3 cdc_and_svd_ckf_merger_mva_training.py --show-output
94
95This will show the outputs and show potential errors in the definitions of the
96luigi task dependencies. To run the the steering file in normal (local) mode,
97run::
98
99 python3 cdc_and_svd_ckf_merger_mva_training.py
100
101One can use the interactive luigi web interface via the central scheduler
102which visualizes the task graph while it is running. Therefore, the scheduler
103daemon ``luigid`` has to run in the background, which is located in
104``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
105example, run::
106
107 luigid --port 8886
108
109Then, execute your steering (e.g. in another terminal) with::
110
111 python3 cdc_and_svd_ckf_merger_mva_training.py --scheduler-port 8886
112
113To view the web interface, open your webbrowser enter into the url bar::
114
115 localhost:8886
116
117If you don't run the steering file on the same machine on which you run your web
118browser, you have two options:
119
120 1. Run both the steering file and ``luigid`` remotely and use
121 ssh-port-forwarding to your local host. Therefore, run on your local
122 machine::
123
124 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
125
126 2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
127 local host>`` argument when calling the steering file
128
129Accessing the results / output files
130~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131
132All output files are stored in a directory structure in the ``result_path``. The
133directory tree encodes the used b2luigi parameters. This ensures reproducibility
134and makes parameter searches easy. Sometimes, it is hard to find the relevant
135output files. You can view the whole directory structure by running ``tree
136<result_path>``. Ise the unix ``find`` command to find the files that interest
137you, e.g.::
138
139 find <result_path> -name "*.root" # find all ROOT files
140"""
141
142import itertools
143import json
144import os
145import subprocess
146import tempfile
147
148import basf2_mva
149import basf2
150# from tracking import add_track_finding
151from tracking.path_utils import add_hit_preparation_modules, add_cdc_track_finding, add_svd_standalone_tracking
152from tracking.harvesting_validation.combined_module import CombinedTrackingValidationModule
153import background
154import simulation
155
156from ckf_training import my_basf2_mva_teacher, create_fbdt_option_string
157from tracking_mva_filter_payloads.write_tracking_mva_filter_payloads_to_db import write_tracking_mva_filter_payloads_to_db
158
159# wrap python modules that are used here but not in the externals into a try except block
160install_helpstring_formatter = ("\nCould not find {module} python module.Try installing it via\n"
161 " python3 -m pip install [--user] {module}\n")
162try:
163 import b2luigi
164 from b2luigi.core.utils import create_output_dirs
165 from b2luigi.basf2_helper import Basf2PathTask, Basf2Task
166except ModuleNotFoundError:
167 print(install_helpstring_formatter.format(module="b2luigi"))
168 raise
169
170
171class LSFTask(b2luigi.Task):
172 """
173 Simple task that defines the configuration of the LSF batch submission.
174 """
175
176
177 batch_system = 'lsf'
178
179 queue = 's'
180
181 def __init__(self, *args, **kwargs):
182 """Constructor."""
183 super().__init__(*args, **kwargs)
184
185 self.job_name = self.task_id
186
187
189 """
190 Same as LSFTask, but for memory-intensive tasks.
191 """
192
193
194 job_slots = '4'
195
196
197class GenerateSimTask(Basf2PathTask, LSFTask):
198 """
199 Generate simulated Monte Carlo with background overlay.
200
201 Make sure to use different ``random_seed`` parameters for the training data
202 format the classifier trainings and for the test data for the respective
203 evaluation/validation tasks.
204 """
205
206
207 experiment_number = b2luigi.IntParameter()
208
209 n_events = b2luigi.IntParameter()
210
212 random_seed = b2luigi.Parameter()
213
214 bkgfiles_dir = b2luigi.Parameter(
215
216 hashed=True
217
218 )
219
220
221 def output_file_name(self, n_events=None, random_seed=None):
222 """
223 Create output file name depending on number of events and production
224 mode that is specified in the random_seed string.
225
226 :param n_events: Number of events to simulate.
227 :param random_seed: Random seed to use for the simulation to create independent samples.
228 """
229 if n_events is None:
230 n_events = self.n_events
231 if random_seed is None:
232 random_seed = self.random_seed
233 return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
234
235 def output(self):
236 """
237 Generate list of output files that the task should produce.
238 The task is considered finished if and only if the outputs all exist.
239 """
240 yield self.add_to_output(self.output_file_name())
241
242 def create_path(self):
243 """
244 Create basf2 path to process with event generation and simulation.
245 """
246 basf2.set_random_seed(self.random_seed)
247 path = basf2.create_path()
248 path.add_module(
249 "EventInfoSetter", evtNumList=[self.n_events], runList=[0], expList=[self.experiment_number]
250 )
251 path.add_module("EvtGenInput")
252 bkg_files = ""
253 # \cond suppress doxygen warning
254 if self.experiment_number == 0:
256 else:
258 # \endcond
259
260 simulation.add_simulation(path, bkgfiles=bkg_files, bkgOverlay=True, usePXDDataReduction=False)
261
262 path.add_module(
263 "RootOutput",
264 outputFileName=self.get_output_file_name(self.output_file_name()),
265 )
266 return path
267
268 def remove_output(self):
269 """
270 Default function from base b2luigi.Task class.
271 """
272 self._remove_output()
273
274
275# I don't use the default MergeTask or similar because they only work if every input file is called the same.
276# Additionally, I want to add more features like deleting the original input to save storage space.
277class SplitNMergeSimTask(Basf2Task, LSFTask):
278 """
279 Generate simulated Monte Carlo with background overlay.
280
281 Make sure to use different ``random_seed`` parameters for the training data
282 format the classifier trainings and for the test data for the respective
283 evaluation/validation tasks.
284 """
285
286
287 experiment_number = b2luigi.IntParameter()
288
289 n_events = b2luigi.IntParameter()
290
292 random_seed = b2luigi.Parameter()
293
294 bkgfiles_dir = b2luigi.Parameter(
295
296 hashed=True
297
298 )
299
300
301 def output_file_name(self, n_events=None, random_seed=None):
302 """
303 Create output file name depending on number of events and production
304 mode that is specified in the random_seed string.
305
306 :param n_events: Number of events to simulate.
307 :param random_seed: Random seed to use for the simulation to create independent samples.
308 """
309 if n_events is None:
310 n_events = self.n_events
311 if random_seed is None:
312 random_seed = self.random_seed
313 return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
314
315 def output(self):
316 """
317 Generate list of output files that the task should produce.
318 The task is considered finished if and only if the outputs all exist.
319 """
320 yield self.add_to_output(self.output_file_name())
321
322 def requires(self):
323 """
324 This task requires several GenerateSimTask to be finished so that he required number of events is created.
325 """
326 n_events_per_task = SummaryTask.n_events_per_task
327 quotient, remainder = divmod(self.n_events, n_events_per_task)
328 for i in range(quotient):
329 yield GenerateSimTask(
330 bkgfiles_dir=self.bkgfiles_dir,
331 num_processes=SummaryTask.num_processes,
332 random_seed=self.random_seed + '_' + str(i).zfill(3),
333 n_events=n_events_per_task,
334 experiment_number=self.experiment_number,
335 )
336 if remainder > 0:
337 yield GenerateSimTask(
338 bkgfiles_dir=self.bkgfiles_dir,
339 num_processes=SummaryTask.num_processes,
340 random_seed=self.random_seed + '_' + str(quotient).zfill(3),
341 n_events=remainder,
342 experiment_number=self.experiment_number,
343 )
344
345 @b2luigi.on_temporary_files
346 def process(self):
347 """
348 When all GenerateSimTasks finished, merge the output.
349 """
350 create_output_dirs(self)
351
352 file_list = [f for f in self.get_all_input_file_names()]
353 print("Merge the following files:")
354 print(file_list)
355 cmd = ["b2file-merge", "-f"]
356 args = cmd + [self.get_output_file_name(self.output_file_name())] + file_list
357 subprocess.check_call(args)
358
359 def on_success(self):
360 """
361 On success method.
362 """
363 print("Finished merging. Now remove the input files to save space.")
364 file_list = [f for f in self.get_all_input_file_names()]
365 for input_file in file_list:
366 try:
367 os.remove(input_file)
368 except FileNotFoundError:
369 pass
370
371 def remove_output(self):
372 """
373 Default function from base b2luigi.Task class.
374 """
375 self._remove_output()
376
377
378class ResultRecordingTask(Basf2PathTask, LSFTask):
379 """
380 Task to record data for the final result filter. This only requires found and MC-matched SVD and CDC tracks that need to be
381 merged, all state filters are set to "all"
382 """
383
384
385 experiment_number = b2luigi.IntParameter()
386
387 n_events_training = b2luigi.IntParameter()
388
390 random_seed = b2luigi.Parameter()
391
392
393 result_filter_records_name = b2luigi.Parameter()
394
395 def output(self):
396 """
397 Generate list of output files that the task should produce.
398 The task is considered finished if and only if the outputs all exist.
399 """
400 yield self.add_to_output(self.result_filter_records_name)
401
402 def requires(self):
403 """
404 This task requires that the training SplitMergeSimTask is finished.
405 """
406 yield SplitNMergeSimTask(
407 bkgfiles_dir=SummaryTask.bkgfiles_by_exp[self.experiment_number],
408 random_seed=self.random_seed,
409 n_events=self.n_events_training,
410 experiment_number=self.experiment_number,
411 )
412
413 def create_result_recording_path(self, result_filter_records_name):
414 """
415 Create a path for the recording of the result filter. This file is then used to train the result filter.
416
417 :param result_filter_records_name: Name of the recording file.
418 """
419
420 path = basf2.create_path()
421
422 # get all the file names from the list of input files that are meant for training
423 file_list = [fname for fname in self.get_all_input_file_names()
424 if "generated_mc_N" in fname and "training" in fname and fname.endswith(".root")]
425 path.add_module("RootInput", inputFileNames=file_list)
426
427 path.add_module("Gearbox")
428 path.add_module("Geometry")
429 path.add_module("SetupGenfitExtrapolation")
430
431 add_hit_preparation_modules(path, components=["SVD"])
432
433 # MCTrackFinding
434 mc_reco_tracks = "MCRecoTracks"
435 path.add_module('TrackFinderMCTruthRecoTracks',
436 RecoTracksStoreArrayName=mc_reco_tracks)
437
438 # CDC track finding and MC matching
439 cdc_reco_tracks = "CDCRecoTracks"
440 add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
441 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=False, UseCDCHits=True,
442 mcRecoTracksStoreArrayName=mc_reco_tracks,
443 prRecoTracksStoreArrayName=cdc_reco_tracks)
444
445 path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
446
447 # SVD track finding and MC matching
448 svd_reco_tracks = "SVDRecoTracks"
449 add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
450 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=False,
451 mcRecoTracksStoreArrayName=mc_reco_tracks,
452 prRecoTracksStoreArrayName=svd_reco_tracks)
453
454 direction = "backward"
455 path.add_module("CDCToSVDSeedCKF",
456 inputRecoTrackStoreArrayName=cdc_reco_tracks,
457
458 fromRelationStoreArrayName=cdc_reco_tracks,
459 toRelationStoreArrayName=svd_reco_tracks,
460
461 relatedRecoTrackStoreArrayName=svd_reco_tracks,
462 cdcTracksStoreArrayName=cdc_reco_tracks,
463 vxdTracksStoreArrayName=svd_reco_tracks,
464
465 relationCheckForDirection=direction,
466 reverseSeed=False,
467 firstHighFilterParameters={"direction": direction},
468 advanceHighFilterParameters={"direction": direction},
469
470 writeOutDirection=direction,
471 endEarly=False,
472
473 filter="recording_with_relations",
474 filterParameters={"rootFileName": result_filter_records_name})
475
476 return path
477
478 def create_path(self):
479 """
480 Create basf2 path to process with event generation and simulation.
481 """
483 result_filter_records_name=self.get_output_file_name(self.result_filter_records_name),
484 )
485
486 def remove_output(self):
487 """
488 Default function from base b2luigi.Task class.
489 """
490 self._remove_output()
491
492
494 """
495 A teacher task runs the basf2 mva teacher on the training data provided by a
496 data collection task.
497
498 Since teacher tasks are needed for all quality estimators covered by this
499 steering file and the only thing that changes is the required data
500 collection task and some training parameters, I decided to use inheritance
501 and have the basic functionality in this base class/interface and have the
502 specific teacher tasks inherit from it.
503 """
504
505 experiment_number = b2luigi.IntParameter()
506
507 n_events_training = b2luigi.IntParameter()
508
510 random_seed = b2luigi.Parameter()
511
512 result_filter_records_name = b2luigi.Parameter()
513
514 training_target = b2luigi.Parameter(
515
516 default="truth"
517
518 )
519
521 exclude_variables = b2luigi.ListParameter(
522
523 hashed=True, default=[]
524
525 )
526
527 fast_bdt_option = b2luigi.ListParameter(
528
529 hashed=True, default=[200, 8, 3, 0.1]
530
531 )
532
533 def get_weightfile_identifier(self, fast_bdt_option=None):
534 """
535 Name of weightfile that is created by the teacher task.
536
537 :param fast_bdt_option: FastBDT option that is used to train this MVA
538 """
539 if fast_bdt_option is None:
540 fast_bdt_option = self.fast_bdt_option
541 fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
542 weightfile_name = "trk_CDCToSVDSeedResultFilter" + fast_bdt_string
543 return weightfile_name
544
545 def requires(self):
546 """
547 Generate list of luigi Tasks that this Task depends on.
548 """
550 experiment_number=self.experiment_number,
551 n_events_training=self.n_events_training,
552 result_filter_records_name=self.result_filter_records_name,
553 random_seed=self.random_seed
554 )
555
556 def output(self):
557 """
558 Generate list of output files that the task should produce.
559 The task is considered finished if and only if the outputs all exist.
560 """
561 yield self.add_to_output(self.get_weightfile_identifier() + ".root")
562
563 def process(self):
564 """
565 Use basf2_mva teacher to create MVA weightfile from collected training
566 data variables.
567
568 This is the main process that is dispatched by the ``run`` method that
569 is inherited from ``Basf2Task``.
570 """
571 records_files = self.get_input_file_names(self.result_filter_records_name)
572 weightfile_identifier = self.get_weightfile_identifier()
573 my_basf2_mva_teacher(
574 records_files=records_files,
575 tree_name="records",
576 weightfile_identifier=weightfile_identifier,
577 target_variable=self.training_target,
578 exclude_variables=self.exclude_variables,
579 fast_bdt_option=self.fast_bdt_option,
580 )
581 basf2_mva.download(weightfile_identifier, self.get_output_file_name(weightfile_identifier + ".root"))
582
583 def remove_output(self):
584 """
585 Default function from base b2luigi.Task class.
586 """
587 self._remove_output()
588
589
591
592 """
593 Validate the performance of the trained filters by trying various combinations of FastBDT options, as well as cut values for
594 the states, the number of best candidates kept after each filter, and similar for the result filter.
595 """
596
597 experiment_number = b2luigi.IntParameter()
598
599 n_events_training = b2luigi.IntParameter()
600
601 fast_bdt_option = b2luigi.ListParameter(
602 # ## \cond
603 hashed=True, default=[200, 8, 3, 0.1]
604 # ## \endcond
605 )
606
607 n_events_testing = b2luigi.IntParameter()
608
609 result_filter_cut = b2luigi.FloatParameter()
610
611 # prepend the testing payloads
612 basf2.conditions.prepend_testing_payloads("localdb/database.txt")
613
614 def output(self):
615 """
616 Generate list of output files that the task should produce.
617 The task is considered finished if and only if the outputs all exist.
618 """
619 fbdt_string = create_fbdt_option_string(self.fast_bdt_option)
620 yield self.add_to_output(
621 f"cdc_svd_merger_ckf_validation{fbdt_string}{self.result_filter_cut}.root")
622
623 def requires(self):
624 """
625 This task requires trained result filters, and that an independent data set for validation was created using the
626 ``SplitMergeSimTask`` with the random seed optimisation.
627 """
629 result_filter_records_name="filter_records.root",
630 experiment_number=self.experiment_number,
631 n_events_training=self.n_events_training,
632 fast_bdt_option=self.fast_bdt_option,
633 random_seed='training'
634 )
635 yield SplitNMergeSimTask(
636 bkgfiles_dir=SummaryTask.bkgfiles_by_exp[self.experiment_number],
637 experiment_number=self.experiment_number,
638 n_events=self.n_events_testing,
639 random_seed="optimisation",
640 )
641
643 """
644 Create a path to validate the trained filters.
645 """
646 path = basf2.create_path()
647
648 # get all the file names from the list of input files that are meant for optimisation / validation
649 file_list = [fname for fname in self.get_all_input_file_names()
650 if "generated_mc_N" in fname and "optimisation" in fname and fname.endswith(".root")]
651 path.add_module("RootInput", inputFileNames=file_list)
652
653 path.add_module("Gearbox")
654 path.add_module("Geometry")
655 path.add_module("SetupGenfitExtrapolation")
656
657 add_hit_preparation_modules(path, components=["SVD"])
658
659 cdc_reco_tracks = "CDCRecoTracks"
660 svd_reco_tracks = "SVDRecoTracks"
661 reco_tracks = "RecoTracks"
662 mc_reco_tracks = "MCRecoTracks"
663
664 # CDC track finding and MC matching
665 add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
666
667 path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
668
669 # SVD track finding and MC matching
670 add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
671
672 direction = "backward"
673 fbdt_string = create_fbdt_option_string(self.fast_bdt_option)
674
675 # write the tracking MVA filter parameters and the cut on MVA classifier to be applied on a local db
676 iov = [0, 0, 0, -1]
678 f"trk_CDCToSVDSeedResultFilterParameter{fbdt_string}",
679 iov,
680 f"trk_CDCToSVDSeedResultFilter{fbdt_string}",
682
683 basf2.conditions.prepend_testing_payloads("localdb/database.txt")
684 result_filter_parameters = {"DBPayloadName": f"trk_CDCToSVDSeedResultFilterParameter{fbdt_string}"}
685
686 path.add_module(
687 "CDCToSVDSeedCKF",
688 inputRecoTrackStoreArrayName=cdc_reco_tracks,
689 fromRelationStoreArrayName=cdc_reco_tracks,
690 toRelationStoreArrayName=svd_reco_tracks,
691 relatedRecoTrackStoreArrayName=svd_reco_tracks,
692 cdcTracksStoreArrayName=cdc_reco_tracks,
693 vxdTracksStoreArrayName=svd_reco_tracks,
694 relationCheckForDirection=direction,
695 reverseSeed=False,
696 firstHighFilterParameters={
697 "direction": direction},
698 advanceHighFilterParameters={
699 "direction": direction},
700 writeOutDirection=direction,
701 endEarly=False,
702 filter='mva_with_relations',
703 filterParameters=result_filter_parameters
704 )
705
706 path.add_module('RelatedTracksCombiner',
707 VXDRecoTracksStoreArrayName=svd_reco_tracks,
708 CDCRecoTracksStoreArrayName=cdc_reco_tracks,
709 recoTracksStoreArrayName=reco_tracks)
710
711 path.add_module('TrackFinderMCTruthRecoTracks',
712 RecoTracksStoreArrayName=mc_reco_tracks,
713 WhichParticles=[],
714 UsePXDHits=True,
715 UseSVDHits=True,
716 UseCDCHits=True)
717
718 path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=True,
719 mcRecoTracksStoreArrayName=mc_reco_tracks,
720 prRecoTracksStoreArrayName=reco_tracks)
721
722 path.add_module(
724 output_file_name=self.get_output_file_name(
725 f"cdc_svd_merger_ckf_validation{fbdt_string}{self.result_filter_cut}.root"),
726 reco_tracks_name=reco_tracks,
727 mc_reco_tracks_name=mc_reco_tracks,
728 name="",
729 contact="",
730 expert_level=200))
731
732 return path
733
734 def create_path(self):
735 """
736 Create basf2 path to process with event generation and simulation.
737 """
739
740 def remove_output(self):
741 """
742 Default function from base b2luigi.Task class.
743 """
744 self._remove_output()
745
746
747class SummaryTask(b2luigi.Task):
748 """
749 Task that collects and summarizes the main figure-of-merits from all the
750 (validation and optimisation) child taks.
751 """
752
753 n_events_training = b2luigi.get_setting(
754
755 "n_events_training", default=1000
756
757 )
758
759 n_events_testing = b2luigi.get_setting(
760
761 "n_events_testing", default=500
762
763 )
764
765 n_events_per_task = b2luigi.get_setting(
766
767 "n_events_per_task", default=100
768
769 )
770
771 num_processes = b2luigi.get_setting(
772
773 "basf2_processes_per_worker", default=0
774
775 )
776
777
778 bkgfiles_by_exp = b2luigi.get_setting("bkgfiles_by_exp")
779
780 bkgfiles_by_exp = {int(key): val for (key, val) in bkgfiles_by_exp.items()}
781
782
783 batch_system = 'local'
784
785 output_file_name = 'summary.json'
786
787 def output(self):
788 """
789 Output method.
790 """
791 yield self.add_to_output(self.output_file_name)
792
793 def requires(self):
794 """
795 Generate list of tasks that needs to be done for luigi to finish running
796 this steering file.
797 """
798
799 fast_bdt_options = [
800 [50, 8, 3, 0.1],
801 [100, 8, 3, 0.1],
802 [200, 8, 3, 0.1],
803 ]
804 cut_values = []
805 for i in range(4):
806 cut_values.append((i+1) * 0.2)
807
808 experiment_numbers = b2luigi.get_setting("experiment_numbers")
809
810 # iterate over all possible combinations of parameters from the above defined parameter lists
811 for experiment_number, fast_bdt_option, cut_value in itertools.product(
812 experiment_numbers, fast_bdt_options, cut_values
813 ):
815 experiment_number=experiment_number,
816 n_events_training=self.n_events_training,
817 fast_bdt_option=fast_bdt_option,
818 n_events_testing=self.n_events_testing,
819 result_filter_cut=cut_value,
820 )
821
822 def run(self):
823 """
824 Run method.
825 """
826 import ROOT # noqa
827
828 # These are the "TNtuple" names to check for
829 ntuple_names = (
830 'MCSideTrackingValidationModule_overview_figures_of_merit',
831 'PRSideTrackingValidationModule_overview_figures_of_merit',
832 'PRSideTrackingValidationModule_subdetector_figures_of_merit'
833 )
834
835 # Collect the information in a dictionary...
836 output_dict = {}
837 all_files = self.get_all_input_file_names()
838 for idx, single_file in enumerate(all_files):
839 with ROOT.TFile.Open(single_file, 'READ') as f:
840 branch_data = {}
841 for ntuple_name in ntuple_names:
842 ntuple = f.Get(ntuple_name)
843 for i in range(min(1, ntuple.GetEntries())): # Here we expect only 1 entry
844 ntuple.GetEntry(i)
845 for branch in ntuple.GetListOfBranches():
846 name = branch.GetName()
847 value = getattr(ntuple, name)
848 branch_data[name] = value
849 branch_data['file_path'] = single_file
850 output_dict[f'{idx}'] = branch_data
851
852 # ... and store the information in a JSON file
853 with open(self.get_output_file_name(self.output_file_name), 'w') as f:
854 json.dump(output_dict, f, indent=4)
855
856 def remove_output(self):
857 """
858 Default function from base b2luigi.Task class.
859 """
860 self._remove_output()
861
862
863if __name__ == "__main__":
864
865 b2luigi.set_setting("env_script", "./setup_basf2.sh")
866 b2luigi.set_setting("scratch_dir", tempfile.gettempdir())
867 workers = b2luigi.get_setting("workers", default=1)
868 b2luigi.process(SummaryTask(), workers=workers, batch=True)
get_background_files(folder=None, output_file_info=True)
Definition background.py:17
n_events_training
Number of events to generate for the training data set.
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
experiment_number
Experiment number of the conditions database, e.g.
output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the records file for training the final result filter.
experiment_number
Experiment number of the conditions database, e.g.
output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False, save_all_charged_particles_in_mc=False)