Belle II Software  release-08-01-10
cdc_and_svd_ckf_merger_mva_training.py
1 
8 
9 """
10 cdc_and_svd_ckf_merger_mva_training
11 -----------------------------------------
12 
13 Purpose of this script
14 ~~~~~~~~~~~~~~~~~~~~~~
15 
16 This python script is used for the training and validation of the classifier of
17 the MVA-based result filter of the CDCToSVDSeedCKF, which combines tracks that
18 were found by the CDC and SVD standalone tracking algorithms.
19 
20 To avoid mistakes, b2luigi is used to create a task chain for a combined training and
21 validation of all classifiers.
22 
23 The order of the b2luigi tasks in this script is as follows (top to bottom):
24 * Two tasks to create input samples for training and testing (``GenerateSimTask`` and
25 ``SplitNMergeSimTask``). The ``SplitNMergeSimTask`` takes a number of events to be
26 generated and a number of events per task to reduce runtime. It then divides the total
27 number of events by the number of events per task and creates as ``GenerateSimTask`` as
28 needed, each with a specific random seed, so that in the end the total number of
29 training and testing events are simulated. The individual files are then combined
30 by the SplitNMergeSimTask into one file each for training and testing.
31 * The ``ResultRecordingTask`` writes out the data used for training of the MVA.
32 * The ``CKFResultFilterTeacherTask`` trains the MVA, FastBDT per default, with a
33 given set of FastBDT options.
34 * The ``ValidationAndOptimisationTask`` uses the trained weight files and cut values
35 provided to run the tracking chain with the weight file under test, and also
36 runs the tracking validation.
37 * Finally, the ``MainTask`` is the "brain" of the script. It invokes the
38 ``ValidationAndOptimisationTask`` with the different combinations of FastBDT options
39 and cut values on the MVA classifier output.
40 
41 Due to the dependencies, the calls of the task are reversed. The MainTask
42 calls the ``ValidationAndOptimisationTask`` with different FastBDT options and cut
43 values, and the ``ValidationAndOptimisationTask`` itself calls the required teacher,
44 training, and simulation tasks.
45 
46 b2luigi: Understanding the steering file
47 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48 
49 All trainings and validations are done in the correct order in this steering
50 file. For the purpose of creating a dependency graph, the `b2luigi
51 <https://b2luigi.readthedocs.io>`_ python package is used, which extends the
52 `luigi <https://luigi.readthedocs.io>`_ package developed by spotify.
53 
54 Each task that has to be done is represented by a special class, which defines
55 which defines parameters, output files and which other tasks with which
56 parameters it depends on. For example a teacher task, which runs
57 ``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
58 task which runs a reconstruction and writes out track-wise variables into a root
59 file for training. An evaluation/validation task for testing the classifier
60 requires both the teacher task, as it needs the weightfile to be present, and
61 also a data collection task, because it needs a dataset for testing classifier.
62 
63 The final task that defines which tasks need to be done for the steering file to
64 finish is the ``MainTask``. When you only want to run parts of the
65 training/validation pipeline, you can comment out requirements in the Master
66 task or replace them by lower-level tasks during debugging.
67 
68 Requirements
69 ~~~~~~~~~~~~
70 
71 This steering file relies on b2luigi_ for task scheduling. It can be installed
72 via pip::
73 
74  python3 -m pip install [--user] b2luigi
75 
76 Use the ``--user`` option if you have not rights to install python packages into
77 your externals (e.g. because you are using cvmfs) and install them in
78 ``$HOME/.local`` instead.
79 
80 Configuration
81 ~~~~~~~~~~~~~
82 
83 Instead of command line arguments, the b2luigi script is configured via a
84 ``settings.json`` file. Open it in your favorite text editor and modify it to
85 fit to your requirements.
86 
87 Usage
88 ~~~~~
89 
90 You can test the b2luigi without running it via::
91 
92  python3 cdc_and_svd_ckf_merger_mva_training.py --dry-run
93  python3 cdc_and_svd_ckf_merger_mva_training.py --show-output
94 
95 This will show the outputs and show potential errors in the definitions of the
96 luigi task dependencies. To run the the steering file in normal (local) mode,
97 run::
98 
99  python3 cdc_and_svd_ckf_merger_mva_training.py
100 
101 One can use the interactive luigi web interface via the central scheduler
102 which visualizes the task graph while it is running. Therefore, the scheduler
103 daemon ``luigid`` has to run in the background, which is located in
104 ``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
105 example, run::
106 
107  luigid --port 8886
108 
109 Then, execute your steering (e.g. in another terminal) with::
110 
111  python3 cdc_and_svd_ckf_merger_mva_training.py --scheduler-port 8886
112 
113 To view the web interface, open your webbrowser enter into the url bar::
114 
115  localhost:8886
116 
117 If you don't run the steering file on the same machine on which you run your web
118 browser, you have two options:
119 
120  1. Run both the steering file and ``luigid`` remotely and use
121  ssh-port-forwarding to your local host. Therefore, run on your local
122  machine::
123 
124  ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
125 
126  2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
127  local host>`` argument when calling the steering file
128 
129 Accessing the results / output files
130 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131 
132 All output files are stored in a directory structure in the ``result_path``. The
133 directory tree encodes the used b2luigi parameters. This ensures reproducibility
134 and makes parameter searches easy. Sometimes, it is hard to find the relevant
135 output files. You can view the whole directory structure by running ``tree
136 <result_path>``. Ise the unix ``find`` command to find the files that interest
137 you, e.g.::
138 
139  find <result_path> -name "*.root" # find all ROOT files
140 """
141 
142 import itertools
143 import subprocess
144 
145 import basf2
146 # from tracking import add_track_finding
147 from tracking.path_utils import add_hit_preparation_modules, add_cdc_track_finding, add_svd_standalone_tracking
148 from tracking.harvesting_validation.combined_module import CombinedTrackingValidationModule
149 import background
150 import simulation
151 
152 from ckf_training import my_basf2_mva_teacher, create_fbdt_option_string
153 
154 # wrap python modules that are used here but not in the externals into a try except block
155 install_helpstring_formatter = ("\nCould not find {module} python module.Try installing it via\n"
156  " python3 -m pip install [--user] {module}\n")
157 try:
158  import b2luigi
159  from b2luigi.core.utils import create_output_dirs
160  from b2luigi.basf2_helper import Basf2PathTask, Basf2Task
161 except ModuleNotFoundError:
162  print(install_helpstring_formatter.format(module="b2luigi"))
163  raise
164 
165 
166 class GenerateSimTask(Basf2PathTask):
167  """
168  Generate simulated Monte Carlo with background overlay.
169 
170  Make sure to use different ``random_seed`` parameters for the training data
171  format the classifier trainings and for the test data for the respective
172  evaluation/validation tasks.
173  """
174 
175 
176  experiment_number = b2luigi.IntParameter()
177 
178  n_events = b2luigi.IntParameter()
179 
181  random_seed = b2luigi.Parameter()
182 
183  bkgfiles_dir = b2luigi.Parameter(
184 
185  hashed=True
186 
187  )
188 
189  queue = 'l'
190 
191 
192  def output_file_name(self, n_events=None, random_seed=None):
193  """
194  Create output file name depending on number of events and production
195  mode that is specified in the random_seed string.
196 
197  :param n_events: Number of events to simulate.
198  :param random_seed: Random seed to use for the simulation to create independent samples.
199  """
200  if n_events is None:
201  n_events = self.n_eventsn_events
202  if random_seed is None:
203  random_seed = self.random_seedrandom_seed
204  return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
205 
206  def output(self):
207  """
208  Generate list of output files that the task should produce.
209  The task is considered finished if and only if the outputs all exist.
210  """
211  yield self.add_to_output(self.output_file_nameoutput_file_name())
212 
213  def create_path(self):
214  """
215  Create basf2 path to process with event generation and simulation.
216  """
217  basf2.set_random_seed(self.random_seedrandom_seed)
218  path = basf2.create_path()
219  path.add_module(
220  "EventInfoSetter", evtNumList=[self.n_eventsn_events], runList=[0], expList=[self.experiment_numberexperiment_number]
221  )
222  path.add_module("EvtGenInput")
223  bkg_files = ""
224  if self.experiment_numberexperiment_number == 0:
225  bkg_files = background.get_background_files()
226  else:
227  bkg_files = background.get_background_files(self.bkgfiles_dirbkgfiles_dir)
228 
229  simulation.add_simulation(path, bkgfiles=bkg_files, bkgOverlay=True, usePXDDataReduction=False)
230 
231  path.add_module(
232  "RootOutput",
233  outputFileName=self.get_output_file_name(self.output_file_nameoutput_file_name()),
234  )
235  return path
236 
237 
238 # I don't use the default MergeTask or similar because they only work if every input file is called the same.
239 # Additionally, I want to add more features like deleting the original input to save storage space.
240 class SplitNMergeSimTask(Basf2Task):
241  """
242  Generate simulated Monte Carlo with background overlay.
243 
244  Make sure to use different ``random_seed`` parameters for the training data
245  format the classifier trainings and for the test data for the respective
246  evaluation/validation tasks.
247  """
248 
249 
250  experiment_number = b2luigi.IntParameter()
251 
252  n_events = b2luigi.IntParameter()
253 
255  random_seed = b2luigi.Parameter()
256 
257  bkgfiles_dir = b2luigi.Parameter(
258 
259  hashed=True
260 
261  )
262 
263  queue = 'sx'
264 
265 
266  def output_file_name(self, n_events=None, random_seed=None):
267  """
268  Create output file name depending on number of events and production
269  mode that is specified in the random_seed string.
270 
271  :param n_events: Number of events to simulate.
272  :param random_seed: Random seed to use for the simulation to create independent samples.
273  """
274  if n_events is None:
275  n_events = self.n_eventsn_events
276  if random_seed is None:
277  random_seed = self.random_seedrandom_seed
278  return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
279 
280  def output(self):
281  """
282  Generate list of output files that the task should produce.
283  The task is considered finished if and only if the outputs all exist.
284  """
285  yield self.add_to_output(self.output_file_nameoutput_file_name())
286 
287  def requires(self):
288  """
289  This task requires several GenerateSimTask to be finished so that he required number of events is created.
290  """
291  n_events_per_task = MainTask.n_events_per_task
292  quotient, remainder = divmod(self.n_eventsn_events, n_events_per_task)
293  for i in range(quotient):
294  yield GenerateSimTask(
295  bkgfiles_dir=self.bkgfiles_dirbkgfiles_dir,
296  num_processes=MainTask.num_processes,
297  random_seed=self.random_seedrandom_seed + '_' + str(i).zfill(3),
298  n_events=n_events_per_task,
299  experiment_number=self.experiment_numberexperiment_number,
300  )
301  if remainder > 0:
302  yield GenerateSimTask(
303  bkgfiles_dir=self.bkgfiles_dirbkgfiles_dir,
304  num_processes=MainTask.num_processes,
305  random_seed=self.random_seedrandom_seed + '_' + str(quotient).zfill(3),
306  n_events=remainder,
307  experiment_number=self.experiment_numberexperiment_number,
308  )
309 
310  @b2luigi.on_temporary_files
311  def process(self):
312  """
313  When all GenerateSimTasks finished, merge the output.
314  """
315  create_output_dirs(self)
316 
317  file_list = [item for sublist in self.get_input_file_names().values() for item in sublist]
318  print("Merge the following files:")
319  print(file_list)
320  cmd = ["b2file-merge", "-f"]
321  args = cmd + [self.get_output_file_name(self.output_file_nameoutput_file_name())] + file_list
322  subprocess.check_call(args)
323  print("Finished merging. Now remove the input files to save space.")
324  cmd2 = ["rm", "-f"]
325  for tempfile in file_list:
326  args = cmd2 + [tempfile]
327  subprocess.check_call(args)
328 
329 
330 class ResultRecordingTask(Basf2PathTask):
331  """
332  Task to record data for the final result filter. This only requires found and MC-matched SVD and CDC tracks that need to be
333  merged, all state filters are set to "all"
334  """
335 
336 
337  experiment_number = b2luigi.IntParameter()
338 
339  n_events_training = b2luigi.IntParameter()
340 
342  random_seed = b2luigi.Parameter()
343 
344 
345  result_filter_records_name = b2luigi.Parameter()
346 
347  def output(self):
348  """
349  Generate list of output files that the task should produce.
350  The task is considered finished if and only if the outputs all exist.
351  """
352  yield self.add_to_output(self.result_filter_records_nameresult_filter_records_name)
353 
354  def requires(self):
355  """
356  This task requires that the training SplitMergeSimTask is finished.
357  """
358  yield SplitNMergeSimTask(
359  bkgfiles_dir=MainTask.bkgfiles_by_exp[self.experiment_numberexperiment_number],
360  random_seed=self.random_seedrandom_seed,
361  n_events=self.n_events_trainingn_events_training,
362  experiment_number=self.experiment_numberexperiment_number,
363  )
364 
365  def create_result_recording_path(self, result_filter_records_name):
366  """
367  Create a path for the recording of the result filter. This file is then used to train the result filter.
368 
369  :param result_filter_records_name: Name of the recording file.
370  """
371 
372  path = basf2.create_path()
373 
374  # get all the file names from the list of input files that are meant for training
375  file_list = [fname for sublist in self.get_input_file_names().values()
376  for fname in sublist if "generated_mc_N" in fname and "training" in fname and fname.endswith(".root")]
377  path.add_module("RootInput", inputFileNames=file_list)
378 
379  path.add_module("Gearbox")
380  path.add_module("Geometry")
381  path.add_module("SetupGenfitExtrapolation")
382 
383  add_hit_preparation_modules(path, components=["SVD"])
384 
385  # MCTrackFinding
386  mc_reco_tracks = "MCRecoTracks"
387  path.add_module('TrackFinderMCTruthRecoTracks',
388  RecoTracksStoreArrayName=mc_reco_tracks)
389 
390  # CDC track finding and MC matching
391  cdc_reco_tracks = "CDCRecoTracks"
392  add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
393  path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=False, UseCDCHits=True,
394  mcRecoTracksStoreArrayName=mc_reco_tracks,
395  prRecoTracksStoreArrayName=cdc_reco_tracks)
396 
397  path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
398 
399  # SVD track finding and MC matching
400  svd_reco_tracks = "SVDRecoTracks"
401  add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
402  path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=False,
403  mcRecoTracksStoreArrayName=mc_reco_tracks,
404  prRecoTracksStoreArrayName=svd_reco_tracks)
405 
406  direction = "backward"
407  path.add_module("CDCToSVDSeedCKF",
408  inputRecoTrackStoreArrayName=cdc_reco_tracks,
409 
410  fromRelationStoreArrayName=cdc_reco_tracks,
411  toRelationStoreArrayName=svd_reco_tracks,
412 
413  relatedRecoTrackStoreArrayName=svd_reco_tracks,
414  cdcTracksStoreArrayName=cdc_reco_tracks,
415  vxdTracksStoreArrayName=svd_reco_tracks,
416 
417  relationCheckForDirection=direction,
418  reverseSeed=False,
419  firstHighFilterParameters={"direction": direction},
420  advanceHighFilterParameters={"direction": direction},
421 
422  writeOutDirection=direction,
423  endEarly=False,
424 
425  filter="recording_with_relations",
426  filterParameters={"rootFileName": result_filter_records_name})
427 
428  return path
429 
430  def create_path(self):
431  """
432  Create basf2 path to process with event generation and simulation.
433  """
434  return self.create_result_recording_pathcreate_result_recording_path(
435  result_filter_records_name=self.get_output_file_name(self.result_filter_records_nameresult_filter_records_name),
436  )
437 
438 
439 class CKFResultFilterTeacherTask(Basf2Task):
440  """
441  A teacher task runs the basf2 mva teacher on the training data provided by a
442  data collection task.
443 
444  Since teacher tasks are needed for all quality estimators covered by this
445  steering file and the only thing that changes is the required data
446  collection task and some training parameters, I decided to use inheritance
447  and have the basic functionality in this base class/interface and have the
448  specific teacher tasks inherit from it.
449  """
450 
451  experiment_number = b2luigi.IntParameter()
452 
453  n_events_training = b2luigi.IntParameter()
454 
456  random_seed = b2luigi.Parameter()
457 
458  result_filter_records_name = b2luigi.Parameter()
459 
460  training_target = b2luigi.Parameter(
461 
462  default="truth"
463 
464  )
465 
467  exclude_variables = b2luigi.ListParameter(
468 
469  hashed=True, default=[]
470 
471  )
472 
473  fast_bdt_option = b2luigi.ListParameter(
474 
475  hashed=True, default=[200, 8, 3, 0.1]
476 
477  )
478 
479  def get_weightfile_xml_identifier(self, fast_bdt_option=None):
480  """
481  Name of the xml weightfile that is created by the teacher task.
482  It is subsequently used as a local weightfile in the following validation tasks.
483 
484  :param fast_bdt_option: FastBDT option that is used to train this MVA
485  """
486  if fast_bdt_option is None:
487  fast_bdt_option = self.fast_bdt_optionfast_bdt_option
488  fast_bdt_string = create_fbdt_option_string(fast_bdt_option)
489  weightfile_name = "trk_CDCToSVDSeedResultFilter" + fast_bdt_string
490  return weightfile_name + ".xml"
491 
492  def requires(self):
493  """
494  Generate list of luigi Tasks that this Task depends on.
495  """
496  yield ResultRecordingTask(
497  experiment_number=self.experiment_numberexperiment_number,
498  n_events_training=self.n_events_trainingn_events_training,
499  result_filter_records_name=self.result_filter_records_nameresult_filter_records_name,
500  random_seed=self.random_seedrandom_seed
501  )
502 
503  def output(self):
504  """
505  Generate list of output files that the task should produce.
506  The task is considered finished if and only if the outputs all exist.
507  """
508  yield self.add_to_output(self.get_weightfile_xml_identifierget_weightfile_xml_identifier())
509 
510  def process(self):
511  """
512  Use basf2_mva teacher to create MVA weightfile from collected training
513  data variables.
514 
515  This is the main process that is dispatched by the ``run`` method that
516  is inherited from ``Basf2Task``.
517  """
518  records_files = self.get_input_file_names(self.result_filter_records_nameresult_filter_records_name)
519 
520  my_basf2_mva_teacher(
521  records_files=records_files,
522  tree_name="records",
523  weightfile_identifier=self.get_output_file_name(self.get_weightfile_xml_identifierget_weightfile_xml_identifier()),
524  target_variable=self.training_targettraining_target,
525  exclude_variables=self.exclude_variables,
526  fast_bdt_option=self.fast_bdt_optionfast_bdt_option,
527  )
528 
529 
530 class ValidationAndOptimisationTask(Basf2PathTask):
531  """
532  Validate the performance of the trained filters by trying various combinations of FastBDT options, as well as cut values for
533  the states, the number of best candidates kept after each filter, and similar for the result filter.
534  """
535 
536  experiment_number = b2luigi.IntParameter()
537 
538  n_events_training = b2luigi.IntParameter()
539 
540  fast_bdt_option = b2luigi.ListParameter(
541  # ## \cond
542  hashed=True, default=[200, 8, 3, 0.1]
543  # ## \endcond
544  )
545 
546  n_events_testing = b2luigi.IntParameter()
547 
548  result_filter_cut = b2luigi.FloatParameter()
549 
550  def output(self):
551  """
552  Generate list of output files that the task should produce.
553  The task is considered finished if and only if the outputs all exist.
554  """
555  fbdt_string = create_fbdt_option_string(self.fast_bdt_optionfast_bdt_option)
556  yield self.add_to_output(
557  f"cdc_svd_merger_ckf_validation{fbdt_string}_{self.result_filter_cut}.root")
558 
559  def requires(self):
560  """
561  This task requires trained result filters, and that an independent data set for validation was created using the
562  ``SplitMergeSimTask`` with the random seed optimisation.
563  """
565  result_filter_records_name="filter_records.root",
566  experiment_number=self.experiment_numberexperiment_number,
567  n_events_training=self.n_events_trainingn_events_training,
568  fast_bdt_option=self.fast_bdt_optionfast_bdt_option,
569  random_seed='training'
570  )
571  yield SplitNMergeSimTask(
572  bkgfiles_dir=MainTask.bkgfiles_by_exp[self.experiment_numberexperiment_number],
573  experiment_number=self.experiment_numberexperiment_number,
574  n_events=self.n_events_testingn_events_testing,
575  random_seed="optimisation",
576  )
577 
579  """
580  Create a path to validate the trained filters.
581  """
582  path = basf2.create_path()
583 
584  # get all the file names from the list of input files that are meant for optimisation / validation
585  file_list = [fname for sublist in self.get_input_file_names().values()
586  for fname in sublist if "generated_mc_N" in fname and "optimisation" in fname and fname.endswith(".root")]
587  path.add_module("RootInput", inputFileNames=file_list)
588 
589  path.add_module("Gearbox")
590  path.add_module("Geometry")
591  path.add_module("SetupGenfitExtrapolation")
592 
593  add_hit_preparation_modules(path, components=["SVD"])
594 
595  cdc_reco_tracks = "CDCRecoTracks"
596  svd_reco_tracks = "SVDRecoTracks"
597  reco_tracks = "RecoTracks"
598  mc_reco_tracks = "MCRecoTracks"
599 
600  # CDC track finding and MC matching
601  add_cdc_track_finding(path, output_reco_tracks=cdc_reco_tracks)
602 
603  path.add_module("DAFRecoFitter", recoTracksStoreArrayName=cdc_reco_tracks)
604 
605  # SVD track finding and MC matching
606  add_svd_standalone_tracking(path, reco_tracks=svd_reco_tracks)
607 
608  direction = "backward"
609  fbdt_string = create_fbdt_option_string(self.fast_bdt_optionfast_bdt_option)
610  path.add_module(
611  "CDCToSVDSeedCKF",
612  inputRecoTrackStoreArrayName=cdc_reco_tracks,
613  fromRelationStoreArrayName=cdc_reco_tracks,
614  toRelationStoreArrayName=svd_reco_tracks,
615  relatedRecoTrackStoreArrayName=svd_reco_tracks,
616  cdcTracksStoreArrayName=cdc_reco_tracks,
617  vxdTracksStoreArrayName=svd_reco_tracks,
618  relationCheckForDirection=direction,
619  reverseSeed=False,
620  firstHighFilterParameters={
621  "direction": direction},
622  advanceHighFilterParameters={
623  "direction": direction},
624  writeOutDirection=direction,
625  endEarly=False,
626  filter='mva_with_relations',
627  filterParameters={
628  "identifier": self.get_input_file_names(f"trk_CDCToSVDSeedResultFilter{fbdt_string}.xml")[0],
629  "cut": self.result_filter_cutresult_filter_cut})
630 
631  path.add_module('RelatedTracksCombiner',
632  VXDRecoTracksStoreArrayName=svd_reco_tracks,
633  CDCRecoTracksStoreArrayName=cdc_reco_tracks,
634  recoTracksStoreArrayName=reco_tracks)
635 
636  path.add_module('TrackFinderMCTruthRecoTracks',
637  RecoTracksStoreArrayName=mc_reco_tracks,
638  WhichParticles=[],
639  UsePXDHits=True,
640  UseSVDHits=True,
641  UseCDCHits=True)
642 
643  path.add_module("MCRecoTracksMatcher", UsePXDHits=False, UseSVDHits=True, UseCDCHits=True,
644  mcRecoTracksStoreArrayName=mc_reco_tracks,
645  prRecoTracksStoreArrayName=reco_tracks)
646 
647  path.add_module(
649  output_file_name=self.get_output_file_name(
650  f"cdc_svd_merger_ckf_validation{fbdt_string}_{self.result_filter_cut}.root"),
651  reco_tracks_name=reco_tracks,
652  mc_reco_tracks_name=mc_reco_tracks,
653  name="",
654  contact="",
655  expert_level=200))
656 
657  return path
658 
659  def create_path(self):
660  """
661  Create basf2 path to process with event generation and simulation.
662  """
663  return self.create_optimisation_and_validation_path()
664 
665 
666 class MainTask(b2luigi.WrapperTask):
667  """
668  Wrapper task that needs to finish for b2luigi to finish running this steering file.
669 
670  It is done if the outputs of all required subtasks exist. It is thus at the
671  top of the luigi task graph. Edit the ``requires`` method to steer which
672  tasks and with which parameters you want to run.
673  """
674 
675  n_events_training = b2luigi.get_setting(
676 
677  "n_events_training", default=1000
678 
679  )
680 
681  n_events_testing = b2luigi.get_setting(
682 
683  "n_events_testing", default=500
684 
685  )
686 
687  n_events_per_task = b2luigi.get_setting(
688 
689  "n_events_per_task", default=100
690 
691  )
692 
693  num_processes = b2luigi.get_setting(
694 
695  "basf2_processes_per_worker", default=0
696 
697  )
698 
699 
700  bkgfiles_by_exp = b2luigi.get_setting("bkgfiles_by_exp")
701 
702  bkgfiles_by_exp = {int(key): val for (key, val) in bkgfiles_by_exp.items()}
703 
704  def requires(self):
705  """
706  Generate list of tasks that needs to be done for luigi to finish running
707  this steering file.
708  """
709 
710  fast_bdt_options = [
711  [50, 8, 3, 0.1],
712  [100, 8, 3, 0.1],
713  [200, 8, 3, 0.1],
714  ]
715  cut_values = []
716  for i in range(4):
717  cut_values.append((i+1) * 0.2)
718 
719  experiment_numbers = b2luigi.get_setting("experiment_numbers")
720 
721  # iterate over all possible combinations of parameters from the above defined parameter lists
722  for experiment_number, fast_bdt_option, cut_value in itertools.product(
723  experiment_numbers, fast_bdt_options, cut_values
724  ):
726  experiment_number=experiment_number,
727  n_events_training=self.n_events_trainingn_events_training,
728  fast_bdt_option=fast_bdt_option,
729  n_events_testing=self.n_events_testingn_events_testing,
730  result_filter_cut=cut_value,
731  )
732 
733 
734 if __name__ == "__main__":
735  b2luigi.set_setting("env_script", "./setup_basf2.sh")
736  b2luigi.set_setting("batch_system", "htcondor")
737  workers = b2luigi.get_setting("workers", default=1)
738  b2luigi.process(MainTask(), workers=workers, batch=True)
def get_background_files(folder=None, output_file_info=True)
Definition: background.py:17
n_events_training
Number of events to generate for the training data set.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
experiment_number
Experiment number of the conditions database, e.g.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
experiment_number
Experiment number of the conditions database, e.g.
result_filter_records_name
Name of the records file for training the final result filter.
experiment_number
Experiment number of the conditions database, e.g.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
result_filter_cut
Value of the cut on the MVA classifier output for a result candidate.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the testing, validation, and optimisation data set.
def add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False)
Definition: simulation.py:121