Belle II Software  release-05-01-25
combined_quality_estimator_teacher.py
1 #!/usr/bin/env python3
2 # -*- coding: utf-8 -*-
3 
4 """
5 combined_module_quality_estimator_teacher
6 -----------------------------------------
7 
8 Information on the MVA Track Quality Indicator / Estimator can be found
9 on `Confluence
10 <https://confluence.desy.de/display/BI/MVA+Track+Quality+Indicator>`_.
11 
12 Purpose of this script
13 ~~~~~~~~~~~~~~~~~~~~~~
14 
15 This python script is used for the combined training and validation of three
16 classifiers, the actual final MVA track quality estimator and the two quality
17 estimators for the intermediate standalone track finders that it depends on.
18 
19  - Final MVA track quality estimator:
20  The final quality estimator for fully merged and fitted tracks (RecoTracks).
21  Its classifier uses features from the track fitting, merger, hit pattern, ...
22  But it also uses the outputs from respective intermediate quality
23  estimators for the VXD and the CDC track finding as inputs. It provides
24  the final quality indicator (QI) exported to the track objects.
25 
26  - VXDTF2 track quality estimator:
27  MVA quality estimator for the VXD standalone track finding.
28 
29  - CDC track quality estimator:
30  MVA quality estimator for the CDC standalone track finding.
31 
32 Each classifier requires for its training a different training data set and they
33 need to be validated on a separate testing data set. Further, the final quality
34 estimator can only be trained, when the trained weights for the intermediate
35 quality estimators are available. If the final estimator shall be trained without
36 one or both previous estimators, the requirements have to be commented out in the
37 __init__.py file of tracking.
38 For all estimators, a list of variables to be ignored is specified in the MasterTask.
39 The current choice is mainly based on pure data MC agreement in these quantities or
40 on outdated implementations. It was decided to leave them in the hardcoded "ugly" way
41 in here to remind future generations that they exist in principle and they should and
42 could be added to the estimator, once their modelling becomes better in future or an
43 alternative implementation is programmed.
44 To avoid mistakes, b2luigi is used to create a task chain for a combined training and
45 validation of all classifiers.
46 
47 b2luigi: Understanding the steering file
48 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
49 
50 All trainings and validations are done in the correct order in this steering
51 file. For the purpose of creating a dependency graph, the `b2luigi
52 <https://b2luigi.readthedocs.io>`_ python package is used, which extends the
53 `luigi <https://luigi.readthedocs.io>`_ packag developed by spotify.
54 
55 Each task that has to be done is represented by a special class, which defines
56 which defines parameters, output files and which other tasks with which
57 parameters it depends on. For example a teacher task, which runs
58 ``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
59 task which runs a reconstruction and writes out track-wise variables into a root
60 file for training. An evaluation/validation task for testing the classifier
61 requires both the teacher task, as it needs the weightfile to be present, and
62 also a data collection task, because it needs a dataset for testing classifier.
63 
64 The final task that defines which tasks need to be done for the steering file to
65 finish is the ``MasterTask``. When you only want to run parts of the
66 training/validation pipeline, you can comment out requirements in the Master
67 task or replace them by lower-level tasks during debugging.
68 
69 Requirements
70 ~~~~~~~~~~~~
71 
72 This steering file relies on b2luigi_ for task scheduling and `uncertain_panda
73 <https://github.com/nils-braun/uncertain_panda>`_ for uncertainty calculations.
74 uncertain_panda is not in the externals and b2luigi is not upto v01-07-01. Both
75 can be installed via pip::
76 
77  python3 -m pip install [--user] b2luigi uncertain_panda
78 
79 Use the ``--user`` option if you have not rights to install python packages into
80 your externals (e.g. because you are using cvmfs) and install them in
81 ``$HOME/.local`` instead.
82 
83 Configuration
84 ~~~~~~~~~~~~~
85 
86 Instead of command line arguments, the b2luigi script is configured via a
87 ``settings.json`` file. Open it in your favorite text editor and modify it to
88 fit to your requirements.
89 
90 Usage
91 ~~~~~
92 
93 You can test the b2luigi without running it via::
94 
95  python3 combined_quality_estimator_teacher.py --dry-run
96  python3 combined_quality_estimator_teacher.py --show-output
97 
98 This will show the outputs and show potential errors in the definitions of the
99 luigi task dependencies. To run the the steering file in normal (local) mode,
100 run::
101 
102  python3 combined_quality_estimator_teacher.py
103 
104 I usually use the interactive luigi web interface via the central scheduler
105 which visualizes the task graph while it is running. Therefore, the scheduler
106 daemon ``luigid`` has to run in the background, which is located in
107 ``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
108 example, run::
109 
110  luigid --port 8886
111 
112 Then, execute your steering (e.g. in another terminal) with::
113 
114  python3 combined_quality_estimator_teacher.py --scheduler-port 8886
115 
116 To view the web interface, open your webbrowser enter into the url bar::
117 
118  localhost:8886
119 
120 If you don't run the steering file on the same machine on which you run your web
121 browser, you have two options:
122 
123  1. Run both the steering file and ``luigid`` remotely and use
124  ssh-port-forwarding to your local host. Therefore, run on your local
125  machine::
126 
127  ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
128 
129  2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
130  local host>`` argument when calling the steering file
131 
132 Accessing the results / output files
133 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134 
135 All output files are stored in a directory structure in the ``result_path``. The
136 directory tree encodes the used b2luigi parameters. This ensures reproducability
137 and makes parameter searches easy. Sometimes, it is hard to find the relevant
138 output files. You can view the whole directory structure by running ``tree
139 <result_path>``. Ise the unix ``find`` command to find the files that interest
140 you, e.g.::
141 
142  find <result_path> -name "*.pdf" # find all validation plot files
143  find <result_path> -name "*.root" # find all ROOT files
144 """
145 
146 import itertools
147 import os
148 from pathlib import Path
149 import shutil
150 import subprocess
151 import textwrap
152 from datetime import datetime
153 from typing import Iterable
154 
155 import matplotlib.pyplot as plt
156 import numpy as np
157 import root_pandas
158 from matplotlib.backends.backend_pdf import PdfPages
159 
160 import basf2
161 import basf2_mva
162 from packaging import version
163 import background
164 import simulation
165 import tracking
166 import tracking.root_utils as root_utils
167 from tracking.harvesting_validation.combined_module import CombinedTrackingValidationModule
168 
169 # wrap python modules that are used here but not in the externals into a try except block
170 install_helpstring_formatter = ("\nCould not find {module} python module.Try installing it via\n"
171  " python3 -m pip install [--user] {module}\n")
172 try:
173  import b2luigi
174  from b2luigi.core.utils import get_serialized_parameters, get_log_file_dir, create_output_dirs
175  from b2luigi.basf2_helper import Basf2PathTask, Basf2Task, HaddTask
176  from b2luigi.core.task import Task, ExternalTask
177  from b2luigi.basf2_helper.utils import get_basf2_git_hash
178 except ModuleNotFoundError:
179  print(install_helpstring_formatter.format(module="b2luigi"))
180  raise
181 try:
182  from uncertain_panda import pandas as upd
183 except ModuleNotFoundError:
184  print(install_helpstring_formatter.format(module="uncertain_panda"))
185  raise
186 
187 # If b2luigi version 0.3.2 or older, it relies on $BELLE2_RELEASE being "head",
188 # which is not the case in the new externals. A fix has been merged into b2luigi
189 # via https://github.com/nils-braun/b2luigi/pull/17 and thus should be available
190 # in future releases.
191 if (
192  version.parse(b2luigi.__version__) <= version.parse("0.3.2") and
193  get_basf2_git_hash() is None and
194  os.getenv("BELLE2_LOCAL_DIR") is not None
195 ):
196  print(f"b2luigi version could not obtain git hash because of a bug not yet fixed in version {b2luigi.__version__}\n"
197  "Please install the latest version of b2luigi from github via\n\n"
198  " python3 -m pip install --upgrade [--user] git+https://github.com/nils-braun/b2luigi.git\n")
199  raise ImportError
200 
201 # Utility functions
202 
203 
204 def create_fbdt_option_string(fast_bdt_option):
205  return "_nTrees" + str(fast_bdt_option[0]) + "_nCuts" + str(fast_bdt_option[1]) + "_nLevels" + \
206  str(fast_bdt_option[2]) + "_shrin" + str(int(round(100*fast_bdt_option[3], 0)))
207 
208 
209 def my_basf2_mva_teacher(
210  records_files,
211  tree_name,
212  weightfile_identifier,
213  target_variable="truth",
214  exclude_variables=None,
215  fast_bdt_option=[200, 8, 3, 0.1] # nTrees, nCuts, nLevels, shrinkage
216 ):
217  """
218  My custom wrapper for basf2 mva teacher. Adapted from code in ``trackfindingcdc_teacher``.
219 
220  :param records_files: List of files with collected ("recorded") variables to use as training data for the MVA.
221  :param tree_name: Name of the TTree in the ROOT file from the ``data_collection_task``
222  that contains the training data for the MVA teacher.
223  :param weightfile_identifier: Name of the weightfile that is created.
224  Should either end in ".xml" for local weightfiles or in ".root", when
225  the weightfile needs later to be uploaded as a payload to the conditions
226  database.
227  :param target_variable: Feature/variable to use as truth label in the quality estimator MVA classifier.
228  :param exclude_variables: List of collected variables to not use in the training of the QE MVA classifier.
229  In addition to variables containing the "truth" substring, which are excluded by default.
230  :param fast_bdt_option: specified fast BDT options, defaut: [200, 8, 3, 0.1] [nTrees, nCuts, nLevels, shrinkage]
231  """
232  if exclude_variables is None:
233  exclude_variables = []
234 
235  weightfile_extension = Path(weightfile_identifier).suffix
236  if weightfile_extension not in {".xml", ".root"}:
237  raise ValueError(f"Weightfile Identifier should end in .xml or .root, but ends in {weightfile_extension}")
238 
239  # extract names of all variables from one record file
240  with root_utils.root_open(records_files[0]) as records_tfile:
241  input_tree = records_tfile.Get(tree_name)
242  feature_names = [leave.GetName() for leave in input_tree.GetListOfLeaves()]
243 
244  # get list of variables to use for training without MC truth
245  truth_free_variable_names = [
246  name
247  for name in feature_names
248  if (
249  ("truth" not in name) and
250  (name != target_variable) and
251  (name not in exclude_variables)
252  )
253  ]
254  if "weight" in truth_free_variable_names:
255  truth_free_variable_names.remove("weight")
256  weight_variable = "weight"
257  elif "__weight__" in truth_free_variable_names:
258  truth_free_variable_names.remove("__weight__")
259  weight_variable = "__weight__"
260  else:
261  weight_variable = ""
262 
263  # Set options for MVA trainihng
264  general_options = basf2_mva.GeneralOptions()
265  general_options.m_datafiles = basf2_mva.vector(*records_files)
266  general_options.m_treename = tree_name
267  general_options.m_weight_variable = weight_variable
268  general_options.m_identifier = weightfile_identifier
269  general_options.m_variables = basf2_mva.vector(*truth_free_variable_names)
270  general_options.m_target_variable = target_variable
271  fastbdt_options = basf2_mva.FastBDTOptions()
272 
273  fastbdt_options.m_nTrees = fast_bdt_option[0]
274  fastbdt_options.m_nCuts = fast_bdt_option[1]
275  fastbdt_options.m_nLevels = fast_bdt_option[2]
276  fastbdt_options.m_shrinkage = fast_bdt_option[3]
277  # Train a MVA method and store the weightfile (MVAFastBDT.root) locally.
278  basf2_mva.teacher(general_options, fastbdt_options)
279 
280 
281 def _my_uncertain_mean(series: upd.Series):
282  """
283  Temporary Workaround bug in ``uncertain_panda`` where a ``ValueError`` is
284  thrown for ``Series.unc.mean`` if the series is empty. Can be replaced by
285  .unc.mean when the issue is fixed.
286  https://github.com/nils-braun/uncertain_panda/issues/2
287  """
288  try:
289  return series.unc.mean()
290  except ValueError:
291  if series.empty:
292  return np.nan
293  else:
294  raise
295 
296 
297 def get_uncertain_means_for_qi_cuts(df: upd.DataFrame, column: str, qi_cuts: Iterable[float]):
298  """
299  Return a pandas series with an mean of the dataframe column and
300  uncertainty for each quality indicator cut.
301 
302  :param df: Pandas dataframe with at least ``quality_indicator``
303  and another numeric ``column``.
304  :param column: Column of which we want to aggregate the means
305  and uncertainties for different QI cuts
306  :param qi_cuts: Iterable of quality indicator minimal thresholds.
307  :returns: Series of of means and uncertainties with ``qi_cuts`` as index
308  """
309 
310  uncertain_means = (_my_uncertain_mean(df.query(f"quality_indicator > {qi_cut}")[column])
311  for qi_cut in qi_cuts)
312  uncertain_means_series = upd.Series(data=uncertain_means, index=qi_cuts)
313  return uncertain_means_series
314 
315 
316 def plot_with_errobands(uncertain_series,
317  error_band_alpha=0.3,
318  plot_kwargs={},
319  fill_between_kwargs={},
320  ax=None):
321  """
322  Plot an uncertain series with error bands for y-errors
323  """
324  if ax is None:
325  ax = plt.gca()
326  uncertain_series = uncertain_series.dropna()
327  ax.plot(uncertain_series.index.values, uncertain_series.nominal_value, **plot_kwargs)
328  ax.fill_between(x=uncertain_series.index,
329  y1=uncertain_series.nominal_value - uncertain_series.std_dev,
330  y2=uncertain_series.nominal_value + uncertain_series.std_dev,
331  alpha=error_band_alpha,
332  **fill_between_kwargs)
333 
334 
335 def format_dictionary(adict, width=80, bullet="•"):
336  """
337  Helper function to format dictionary to string as a wrapped key-value bullet
338  list. Useful to print metadata from dictionaries.
339 
340  :param adict: Dictionary to format
341  :param width: Characters after which to wrap a key-value line
342  :param bullet: Character to begin a key-value line with, e.g. ``-`` for a
343  yaml-like string
344  """
345  # It might be possible to replace this function yaml.dump, but the current
346  # version in the externals does not allow to disable the sorting of the
347  # dictionary yet and also I am not sure if it is wrappable
348  return "\n".join(textwrap.fill(f"{bullet} {key}: {value}", width=width)
349  for (key, value) in adict.items())
350 
351 # Begin definitions of b2luigi task classes
352 
353 
354 class GenerateSimTask(Basf2PathTask):
355  """
356  Generate simulated Monte Carlo with background overlay.
357 
358  Make sure to use different ``random_seed`` parameters for the training data
359  format the classifier trainings and for the test data for the respective
360  evaluation/validation tasks.
361  """
362 
363 
364  n_events = b2luigi.IntParameter()
365 
366  experiment_number = b2luigi.IntParameter()
367 
369  random_seed = b2luigi.Parameter()
370 
371  bkgfiles_dir = b2luigi.Parameter(hashed=True)
372  queue = 'l'
373 
374 
375  def output_file_name(self, n_events=None, random_seed=None):
376  if n_events is None:
377  n_events = self.n_events
378  if random_seed is None:
379  random_seed = self.random_seed
380  return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
381 
382  def output(self):
383  """
384  Generate list of output files that the task should produce.
385  The task is considered finished if and only if the outputs all exist.
386  """
387  yield self.add_to_output(self.output_file_name())
388 
389  def create_path(self):
390  """
391  Create basf2 path to process with event generation and simulation.
392  """
393  basf2.set_random_seed(self.random_seed)
394  path = basf2.create_path()
395  if self.experiment_number in [1002, 1003]:
396  runNo = 0
397  elif self.experiment_number == 12:
398  runNo = 5736
399  path.add_module(
400  "EventInfoSetter", evtNumList=[self.n_events], runList=[runNo], expList=[self.experiment_number]
401  )
402  if "BBBAR" in self.random_seed:
403  path.add_module("EvtGenInput")
404  else:
405  import generators as ge
406  # WARNING: There are a few differences in the production of MC13a and b like the following lines
407  # as well as ActivatePXD.. and the beamparams for bhabha... I use these from MC13b, not a... :/
408  import beamparameters as bp
409  beamparameters = bp.add_beamparameters(path, "Y4S")
410  beamparameters.param("covVertex", [(14.8e-4)**2, (1.5e-4)**2, (360e-4)**2])
411  if "BHABHA" in self.random_seed:
412  ge.add_babayaganlo_generator(path=path, finalstate='ee', minenergy=0.15, minangle=10.0)
413  elif "MUMU" in self.random_seed:
414  ge.add_kkmc_generator(path=path, finalstate='mu+mu-')
415  elif "YY" in self.random_seed:
416  babayaganlo = basf2.register_module('BabayagaNLOInput')
417  babayaganlo.param('FinalState', 'gg')
418  babayaganlo.param('MaxAcollinearity', 180.0)
419  babayaganlo.param('ScatteringAngleRange', [0., 180.])
420  babayaganlo.param('FMax', 75000)
421  babayaganlo.param('MinEnergy', 0.01)
422  babayaganlo.param('Order', 'exp')
423  babayaganlo.param('DebugEnergySpread', 0.01)
424  babayaganlo.param('Epsilon', 0.00005)
425  path.add_module(babayaganlo)
426  generatorpreselection = basf2.register_module('GeneratorPreselection')
427  generatorpreselection.param('nChargedMin', 0)
428  generatorpreselection.param('nChargedMax', 999)
429  generatorpreselection.param('MinChargedPt', 0.15)
430  generatorpreselection.param('MinChargedTheta', 17.)
431  generatorpreselection.param('MaxChargedTheta', 150.)
432  generatorpreselection.param('nPhotonMin', 1)
433  generatorpreselection.param('MinPhotonEnergy', 1.5)
434  generatorpreselection.param('MinPhotonTheta', 15.0)
435  generatorpreselection.param('MaxPhotonTheta', 165.0)
436  generatorpreselection.param('applyInCMS', True)
437  path.add_module(generatorpreselection)
438  # generatorpreselection.if_value('!=11', empty)
439  elif "TAUPAIR" in self.random_seed:
440  ge.add_kkmc_generator(path, finalstate='tau+tau-')
441  elif "DDBAR" in self.random_seed:
442  ge.add_continuum_generator(path, finalstate='ddbar')
443  elif "UUBAR" in self.random_seed:
444  ge.add_continuum_generator(path, finalstate='uubar')
445  elif "SSBAR" in self.random_seed:
446  ge.add_continuum_generator(path, finalstate='ssbar')
447  elif "CCBAR" in self.random_seed:
448  ge.add_continuum_generator(path, finalstate='ccbar')
449  # activate simulation of dead/masked pixel and reproduce detector gain, which will be
450  # applied at reconstruction level when the data GT is present in the DB chain
451  path.add_module("ActivatePXDPixelMasker")
452  path.add_module("ActivatePXDGainCalibrator")
454  if self.experiment_number == 1002:
455  # remove KLM because of bug in backround files with release 4
456  components = ['PXD', 'SVD', 'CDC', 'ECL', 'TOP', 'ARICH', 'TRG']
457  else:
458  components = None
459  simulation.add_simulation(path, bkgfiles=bkg_files, bkgOverlay=True, components=components, usePXDDataReduction=False)
460 
461  path.add_module(
462  "RootOutput",
463  outputFileName=self.get_output_file_name(self.output_file_name()),
464  )
465  return path
466 
467 # I don't use the default MergeTask or similar because they only work if every input file is called the same.
468 # Additionally, I want to add more features like deleting the original input to save storage space.
469 
470 
471 class SplitNMergeSimTask(Basf2Task):
472  """
473  Generate simulated Monte Carlo with background overlay.
474 
475  Make sure to use different ``random_seed`` parameters for the training data
476  format the classifier trainings and for the test data for the respective
477  evaluation/validation tasks.
478  """
479 
480 
481  n_events = b2luigi.IntParameter()
482 
483  experiment_number = b2luigi.IntParameter()
484 
486  random_seed = b2luigi.Parameter()
487 
488  bkgfiles_dir = b2luigi.Parameter(hashed=True)
489  queue = 'sx'
490 
491 
492  def output_file_name(self, n_events=None, random_seed=None):
493  if n_events is None:
494  n_events = self.n_events
495  if random_seed is None:
496  random_seed = self.random_seed
497  return "generated_mc_N" + str(n_events) + "_" + random_seed + ".root"
498 
499  def output(self):
500  """
501  Generate list of output files that the task should produce.
502  The task is considered finished if and only if the outputs all exist.
503  """
504  yield self.add_to_output(self.output_file_name())
505 
506  def requires(self):
507  n_events_per_task = MasterTask.n_events_per_task
508  quotient, remainder = divmod(self.n_events, n_events_per_task)
509  for i in range(quotient):
510  yield GenerateSimTask(
511  bkgfiles_dir=self.bkgfiles_dir,
512  num_processes=MasterTask.num_processes,
513  random_seed=self.random_seed + '_' + str(i).zfill(3),
514  n_events=n_events_per_task,
515  experiment_number=self.experiment_number,
516  )
517  if remainder > 0:
518  yield GenerateSimTask(
519  bkgfiles_dir=self.bkgfiles_dir,
520  num_processes=MasterTask.num_processes,
521  random_seed=self.random_seed + '_' + str(quotient).zfill(3),
522  n_events=remainder,
523  experiment_number=self.experiment_number,
524  )
525 
526  @b2luigi.on_temporary_files
527  def process(self):
528  create_output_dirs(self)
529 
530  file_list = []
531  for _, file_name in self.get_input_file_names().items():
532  file_list.append(*file_name)
533  print("Merge the following files:")
534  print(file_list)
535  cmd = ["b2file-merge", "-f"]
536  args = cmd + [self.get_output_file_name(self.output_file_name())] + file_list
537  subprocess.check_call(args)
538  print("Finished merging. Now remove the input files to save space.")
539  cmd2 = ["rm", "-f"]
540  for tempfile in file_list:
541  args = cmd2 + [tempfile]
542  subprocess.check_call(args)
543 
544 
545 class CheckExistingFile(ExternalTask):
546  """
547  Task to check if the given file really exists.
548  """
549 
550  filename = b2luigi.Parameter()
551 
552  def output(self):
553  from luigi import LocalTarget
554  return LocalTarget(self.filename)
555 
556 
557 class VXDQEDataCollectionTask(Basf2PathTask):
558  """
559  Collect variables/features from VXDTF2 tracking and write them to a ROOT
560  file.
561 
562  These variables are to be used as labelled training data for the MVA
563  classifier which is the VXD track quality estimator
564  """
565 
566  n_events = b2luigi.IntParameter()
567 
568  experiment_number = b2luigi.IntParameter()
569 
571  random_seed = b2luigi.Parameter()
572  queue = 'l'
573 
574 
575  def get_records_file_name(self, n_events=None, random_seed=None):
576  if n_events is None:
577  n_events = self.n_events
578  if random_seed is None:
579  random_seed = self.random_seed
580  if 'vxd' not in random_seed:
581  random_seed += '_vxd'
582  if 'DATA' in random_seed:
583  return 'qe_records_DATA_vxd.root'
584  else:
585  if 'USESIMBB' in random_seed:
586  random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
587  elif 'USESIMEE' in random_seed:
588  random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
589  return 'qe_records_N' + str(n_events) + '_' + random_seed + '.root'
590 
591  def get_input_files(self, n_events=None, random_seed=None):
592  if n_events is None:
593  n_events = self.n_events
594  if random_seed is None:
595  random_seed = self.random_seed
596  if "USESIM" in random_seed:
597  if 'USESIMBB' in random_seed:
598  random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
599  elif 'USESIMEE' in random_seed:
600  random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
601  return ['datafiles/' + GenerateSimTask.output_file_name(GenerateSimTask,
602  n_events=n_events, random_seed=random_seed)]
603  elif "DATA" in random_seed:
604  return MasterTask.datafiles
605  else:
606  return self.get_input_file_names(GenerateSimTask.output_file_name(
607  GenerateSimTask, n_events=n_events, random_seed=random_seed))
608 
609  def requires(self):
610  """
611  Generate list of luigi Tasks that this Task depends on.
612  """
613  if "USESIM" in self.random_seed or "DATA" in self.random_seed:
614  for filename in self.get_input_files():
615  yield CheckExistingFile(
616  filename=filename,
617  )
618  else:
619  yield SplitNMergeSimTask(
620  bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.experiment_number],
621  random_seed=self.random_seed,
622  n_events=self.n_events,
623  experiment_number=self.experiment_number,
624  )
625 
626  def output(self):
627  """
628  Generate list of output files that the task should produce.
629  The task is considered finished if and only if the outputs all exist.
630  """
631  yield self.add_to_output(self.get_records_file_name())
632 
633  def create_path(self):
634  """
635  Create basf2 path with VXDTF2 tracking and VXD QE data collection.
636  """
637  path = basf2.create_path()
638  inputFileNames = self.get_input_files()
639  path.add_module(
640  "RootInput",
641  inputFileNames=inputFileNames,
642  )
643  path.add_module("Gearbox")
644  tracking.add_geometry_modules(path)
645  if 'DATA' in self.random_seed:
646  from rawdata import add_unpackers
647  add_unpackers(path, components=['SVD', 'PXD'])
648  tracking.add_hit_preparation_modules(path)
649  tracking.add_vxd_track_finding_vxdtf2(
650  path, components=["SVD"], add_mva_quality_indicator=False
651  )
652  if 'DATA' in self.random_seed:
653  path.add_module(
654  "VXDQETrainingDataCollector",
655  TrainingDataOutputName=self.get_output_file_name(self.get_records_file_name()),
656  SpacePointTrackCandsStoreArrayName="SPTrackCands",
657  EstimationMethod="tripletFit",
658  UseTimingInfo=False,
659  ClusterInformation="Average",
660  MCStrictQualityEstimator=False,
661  mva_target=False,
662  MCInfo=False,
663  )
664  else:
665  path.add_module(
666  "TrackFinderMCTruthRecoTracks",
667  RecoTracksStoreArrayName="MCRecoTracks",
668  WhichParticles=[],
669  UsePXDHits=False,
670  UseSVDHits=True,
671  UseCDCHits=False,
672  )
673  path.add_module(
674  "VXDQETrainingDataCollector",
675  TrainingDataOutputName=self.get_output_file_name(self.get_records_file_name()),
676  SpacePointTrackCandsStoreArrayName="SPTrackCands",
677  EstimationMethod="tripletFit",
678  UseTimingInfo=False,
679  ClusterInformation="Average",
680  MCStrictQualityEstimator=True,
681  mva_target=False,
682  )
683  return path
684 
685 
686 class CDCQEDataCollectionTask(Basf2PathTask):
687  """
688  Collect variables/features from CDC tracking and write them to a ROOT file.
689 
690  These variables are to be used as labelled training data for the MVA
691  classifier which is the CDC track quality estimator
692  """
693 
694  n_events = b2luigi.IntParameter()
695 
696  experiment_number = b2luigi.IntParameter()
697 
699  random_seed = b2luigi.Parameter()
700  queue = 'l'
701 
702 
703  def get_records_file_name(self, n_events=None, random_seed=None):
704  if n_events is None:
705  n_events = self.n_events
706  if random_seed is None:
707  random_seed = self.random_seed
708  if 'cdc' not in random_seed:
709  random_seed += '_cdc'
710  if 'DATA' in random_seed:
711  return 'qe_records_DATA_cdc.root'
712  else:
713  if 'USESIMBB' in random_seed:
714  random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
715  elif 'USESIMEE' in random_seed:
716  random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
717  return 'qe_records_N' + str(n_events) + '_' + random_seed + '.root'
718 
719  def get_input_files(self, n_events=None, random_seed=None):
720  if n_events is None:
721  n_events = self.n_events
722  if random_seed is None:
723  random_seed = self.random_seed
724  if "USESIM" in random_seed:
725  if 'USESIMBB' in random_seed:
726  random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
727  elif 'USESIMEE' in random_seed:
728  random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
729  return ['datafiles/' + GenerateSimTask.output_file_name(GenerateSimTask,
730  n_events=n_events, random_seed=random_seed)]
731  elif "DATA" in random_seed:
732  return MasterTask.datafiles
733  else:
734  return self.get_input_file_names(GenerateSimTask.output_file_name(
735  GenerateSimTask, n_events=n_events, random_seed=random_seed))
736 
737  def requires(self):
738  """
739  Generate list of luigi Tasks that this Task depends on.
740  """
741  if "USESIM" in self.random_seed or "DATA" in self.random_seed:
742  for filename in self.get_input_files():
743  yield CheckExistingFile(
744  filename=filename,
745  )
746  else:
747  yield SplitNMergeSimTask(
748  bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.experiment_number],
749  random_seed=self.random_seed,
750  n_events=self.n_events,
751  experiment_number=self.experiment_number,
752  )
753 
754  def output(self):
755  """
756  Generate list of output files that the task should produce.
757  The task is considered finished if and only if the outputs all exist.
758  """
759  yield self.add_to_output(self.get_records_file_name())
760 
761  def create_path(self):
762  """
763  Create basf2 path with CDC standalone tracking and CDC QE with recording filter for MVA feature collection.
764  """
765  path = basf2.create_path()
766  inputFileNames = self.get_input_files()
767  path.add_module(
768  "RootInput",
769  inputFileNames=inputFileNames,
770  )
771  path.add_module("Gearbox")
772  tracking.add_geometry_modules(path)
773  if 'DATA' in self.random_seed:
774  filter_choice = "recording_data"
775  from rawdata import add_unpackers
776  add_unpackers(path, components=['CDC'])
777  else:
778  filter_choice = "recording"
779  # tracking.add_hit_preparation_modules(path) # only needed for SVD and
780  # PXD hit preparation. Does not change the CDC output.
781  tracking.add_cdc_track_finding(path, with_ca=False, add_mva_quality_indicator=True)
782 
783  basf2.set_module_parameters(
784  path,
785  name="TFCDC_TrackQualityEstimator",
786  filter=filter_choice,
787  filterParameters={
788  "rootFileName": self.get_output_file_name(self.get_records_file_name())
789  },
790  )
791  return path
792 
793 
794 class RecoTrackQEDataCollectionTask(Basf2PathTask):
795  """
796  Collect variables/features from the reco track reconstruction including the
797  fit and write them to a ROOT file.
798 
799  These variables are to be used as labelled training data for the MVA
800  classifier which is the MVA track quality estimator. The collected
801  variables include the classifier outputs from the VXD and CDC quality
802  estimators, namely the CDC and VXD quality indicators, combined with fit,
803  merger, timing, energy loss information etc. This task requires the
804  subdetector quality estimators to be trained.
805  """
806 
807 
808  n_events = b2luigi.IntParameter()
809 
810  experiment_number = b2luigi.IntParameter()
811 
813  random_seed = b2luigi.Parameter()
814 
815  cdc_training_target = b2luigi.Parameter()
816 
819  recotrack_option = b2luigi.Parameter(default='deleteCDCQI080')
820 
821  fast_bdt_option = b2luigi.ListParameter(hashed=True, default=[200, 8, 3, 0.1])
822  queue = 'l'
823 
824 
825  def get_records_file_name(self, n_events=None, random_seed=None, recotrack_option=None):
826  if n_events is None:
827  n_events = self.n_events
828  if random_seed is None:
829  random_seed = self.random_seed
830  if recotrack_option is None:
831  recotrack_option = self.recotrack_option
832  if 'rec' not in random_seed:
833  random_seed += '_rec'
834  if 'DATA' in random_seed:
835  return 'qe_records_DATA_rec.root'
836  else:
837  if 'USESIMBB' in random_seed:
838  random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
839  elif 'USESIMEE' in random_seed:
840  random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
841  return 'qe_records_N' + str(n_events) + '_' + random_seed + '_' + recotrack_option + '.root'
842 
843  def get_input_files(self, n_events=None, random_seed=None):
844  if n_events is None:
845  n_events = self.n_events
846  if random_seed is None:
847  random_seed = self.random_seed
848  if "USESIM" in random_seed:
849  if 'USESIMBB' in random_seed:
850  random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
851  elif 'USESIMEE' in random_seed:
852  random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
853  return ['datafiles/' + GenerateSimTask.output_file_name(GenerateSimTask,
854  n_events=n_events, random_seed=random_seed)]
855  elif "DATA" in random_seed:
856  return MasterTask.datafiles
857  else:
858  return self.get_input_file_names(GenerateSimTask.output_file_name(
859  GenerateSimTask, n_events=n_events, random_seed=random_seed))
860 
861  def requires(self):
862  """
863  Generate list of luigi Tasks that this Task depends on.
864  """
865  if "USESIM" in self.random_seed or "DATA" in self.random_seed:
866  for filename in self.get_input_files():
867  yield CheckExistingFile(
868  filename=filename,
869  )
870  else:
871  yield SplitNMergeSimTask(
872  bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.experiment_number],
873  random_seed=self.random_seed,
874  n_events=self.n_events,
875  experiment_number=self.experiment_number,
876  )
877  if "DATA" not in self.random_seed:
878  if 'useCDC' not in self.recotrack_option and 'noCDC' not in self.recotrack_option:
879  yield CDCQETeacherTask(
880  n_events_training=MasterTask.n_events_training,
881  experiment_number=self.experiment_number,
882  training_target=self.cdc_training_target,
883  process_type=self.random_seed.split("_", 1)[0],
884  exclude_variables=MasterTask.exclude_variables_cdc,
885  fast_bdt_option=self.fast_bdt_option,
886  )
887  if 'useVXD' not in self.recotrack_option and 'noVXD' not in self.recotrack_option:
888  yield VXDQETeacherTask(
889  n_events_training=MasterTask.n_events_training,
890  experiment_number=self.experiment_number,
891  process_type=self.random_seed.split("_", 1)[0],
892  exclude_variables=MasterTask.exclude_variables_vxd,
893  fast_bdt_option=self.fast_bdt_option,
894  )
895 
896  def output(self):
897  """
898  Generate list of output files that the task should produce.
899  The task is considered finished if and only if the outputs all exist.
900  """
901  yield self.add_to_output(self.get_records_file_name())
902 
903  def create_path(self):
904  """
905  Create basf2 reconstruction path that should mirror the default path
906  from ``add_tracking_reconstruction()``, but with modules for the VXD QE
907  and CDC QE application and for collection of variables for the reco
908  track quality estimator.
909  """
910  path = basf2.create_path()
911  inputFileNames = self.get_input_files()
912  path.add_module(
913  "RootInput",
914  inputFileNames=inputFileNames,
915  )
916  path.add_module("Gearbox")
917 
918  # First add tracking reconstruction with default quality estimation modules
919  mvaCDC = True
920  mvaVXD = True
921  if 'noCDC' in self.recotrack_option:
922  mvaCDC = False
923  if 'noVXD' in self.recotrack_option:
924  mvaVXD = False
925  if 'DATA' in self.random_seed:
926  from rawdata import add_unpackers
927  add_unpackers(path)
928  tracking.add_tracking_reconstruction(path, add_cdcTrack_QI=mvaCDC, add_vxdTrack_QI=mvaVXD, add_recoTrack_QI=True)
929 
930  # if data shall be processed check if newly trained mva files are available. Otherwise use default ones (CDB payloads):
931  # if useCDC/VXD is specified, use the identifier lying in datafiles/ Otherwise, replace weightfile identifiers from defaults
932  # (CDB payloads) to new weightfiles created by this b2luigi script
933  if ('DATA' in self.random_seed or 'useCDC' in self.recotrack_option) and 'noCDC' not in self.recotrack_option:
934  cdc_identifier = 'datafiles/' + \
935  CDCQETeacherTask.get_weightfile_xml_identifier(CDCQETeacherTask, fast_bdt_option=self.fast_bdt_option)
936  if os.path.exists(cdc_identifier):
937  replace_cdc_qi = True
938  elif 'useCDC' in self.recotrack_option:
939  raise ValueError(f"CDC QI Identifier not found: {cdc_identifier}")
940  else:
941  replace_cdc_qi = False
942  elif 'noCDC' in self.recotrack_option:
943  replace_cdc_qi = False
944  else:
945  cdc_identifier = self.get_input_file_names(
946  CDCQETeacherTask.get_weightfile_xml_identifier(
947  CDCQETeacherTask, fast_bdt_option=self.fast_bdt_option))[0]
948  replace_cdc_qi = True
949  if ('DATA' in self.random_seed or 'useVXD' in self.recotrack_option) and 'noVXD' not in self.recotrack_option:
950  vxd_identifier = 'datafiles/' + \
951  VXDQETeacherTask.get_weightfile_xml_identifier(VXDQETeacherTask, fast_bdt_option=self.fast_bdt_option)
952  if os.path.exists(vxd_identifier):
953  replace_vxd_qi = True
954  elif 'useVXD' in self.recotrack_option:
955  raise ValueError(f"VXD QI Identifier not found: {vxd_identifier}")
956  else:
957  replace_vxd_qi = False
958  elif 'noVXD' in self.recotrack_option:
959  replace_vxd_qi = False
960  else:
961  vxd_identifier = self.get_input_file_names(
962  VXDQETeacherTask.get_weightfile_xml_identifier(
963  VXDQETeacherTask, fast_bdt_option=self.fast_bdt_option))[0]
964  replace_vxd_qi = True
965 
966  cdc_qe_mva_filter_parameters = None
967  # if tracks below a certain CDC QI index shall be deleted online, this needs to be specified in the filter parameters.
968  # this is also possible in case of the default (CBD) payloads.
969  if 'deleteCDCQI' in self.recotrack_option:
970  cut_index = self.recotrack_option.find('deleteCDCQI') + len('deleteCDCQI')
971  cut = int(self.recotrack_option[cut_index:cut_index+3])/100.
972  if replace_cdc_qi:
973  cdc_qe_mva_filter_parameters = {
974  "identifier": cdc_identifier, "cut": cut}
975  else:
976  cdc_qe_mva_filter_parameters = {
977  "cut": cut}
978  elif replace_cdc_qi:
979  cdc_qe_mva_filter_parameters = {
980  "identifier": cdc_identifier}
981  if cdc_qe_mva_filter_parameters is not None:
982  basf2.set_module_parameters(
983  path,
984  name="TFCDC_TrackQualityEstimator",
985  filterParameters=cdc_qe_mva_filter_parameters,
986  )
987  if replace_vxd_qi:
988  basf2.set_module_parameters(
989  path,
990  name="VXDQualityEstimatorMVA",
991  WeightFileIdentifier=vxd_identifier)
992 
993  # Replace final quality estimator module by training data collector module
994  track_qe_module_name = "TrackQualityEstimatorMVA"
995  module_found = False
996  new_path = basf2.create_path()
997  for module in path.modules():
998  if module.name() != track_qe_module_name:
999  new_path.add_module(module)
1000  else:
1001  new_path.add_module(
1002  "TrackQETrainingDataCollector",
1003  TrainingDataOutputName=self.get_output_file_name(self.get_records_file_name()),
1004  collectEventFeatures=True,
1005  SVDPlusCDCStandaloneRecoTracksStoreArrayName="SVDPlusCDCStandaloneRecoTracks",
1006  )
1007  module_found = True
1008  if not module_found:
1009  raise KeyError(f"No module {track_qe_module_name} found in path")
1010  path = new_path
1011  return path
1012 
1013 
1014 class TrackQETeacherBaseTask(Basf2Task):
1015  """
1016  A teacher task runs the basf2 mva teacher on the training data provided by a
1017  data collection task.
1018 
1019  Since teacher tasks are needed for all quality estimators covered by this
1020  steering file and the only thing that changes is the required data
1021  collection task and some training parameters, I decided to use inheritance
1022  and have the basic functionality in this base class/interface and have the
1023  specific teacher tasks inherit from it.
1024  """
1025 
1026  n_events_training = b2luigi.IntParameter()
1027 
1028  experiment_number = b2luigi.IntParameter()
1029 
1032  process_type = b2luigi.Parameter(default="BBBAR")
1033 
1034  training_target = b2luigi.Parameter(default="truth")
1035 
1037  exclude_variables = b2luigi.ListParameter(hashed=True, default=[])
1038 
1039  fast_bdt_option = b2luigi.ListParameter(hashed=True, default=[200, 8, 3, 0.1])
1040 
1041  @property
1043  """
1044  Property defining the basename for the .xml and .root weightfiles that are created.
1045  Has to be implemented by the inheriting teacher task class.
1046  """
1047  raise NotImplementedError(
1048  "Teacher Task must define a static weightfile_identifier"
1049  )
1050 
1051  def get_weightfile_xml_identifier(self, fast_bdt_option=None, recotrack_option=None):
1052  """
1053  Name of the xml weightfile that is created by the teacher task.
1054  It is subsequently used as a local weightfile in the following validation tasks.
1055  """
1056  if fast_bdt_option is None:
1057  fast_bdt_option = self.fast_bdt_option
1058  if recotrack_option is None and hasattr(self, 'recotrack_option'):
1059  recotrack_option = self.recotrack_option
1060  else:
1061  recotrack_option = ''
1062  weightfile_details = create_fbdt_option_string(fast_bdt_option)
1063  weightfile_name = self.weightfile_identifier_basename + weightfile_details
1064  if recotrack_option != '':
1065  weightfile_name = weightfile_name + '_' + recotrack_option
1066  return weightfile_name + ".weights.xml"
1067 
1068  @property
1069  def tree_name(self):
1070  """
1071  Property defining the name of the tree in the ROOT file from the
1072  ``data_collection_task`` that contains the recorded training data. Must
1073  implemented by the inheriting specific teacher task class.
1074  """
1075  raise NotImplementedError("Teacher Task must define a static tree_name")
1076 
1077  @property
1078  def random_seed(self):
1079  """
1080  Property defining random seed to be used by the ``GenerateSimTask``.
1081  Should differ from the random seed in the test data samples. Must
1082  implemented by the inheriting specific teacher task class.
1083  """
1084  raise NotImplementedError("Teacher Task must define a static random seed")
1085 
1086  @property
1087  def data_collection_task(self) -> Basf2PathTask:
1088  """
1089  Property defining the specific ``DataCollectionTask`` to require. Must
1090  implemented by the inheriting specific teacher task class.
1091  """
1092  raise NotImplementedError(
1093  "Teacher Task must define a data collection task to require "
1094  )
1095 
1096  def requires(self):
1097  """
1098  Generate list of luigi Tasks that this Task depends on.
1099  """
1100  if 'USEREC' in self.process_type:
1101  if 'USERECBB' in self.process_type:
1102  process = 'BBBAR'
1103  elif 'USERECEE' in self.process_type:
1104  process = 'BHABHA'
1105  yield CheckExistingFile(
1106  filename='datafiles/qe_records_N' + str(self.n_events_training) + '_' + process + '_' + self.random_seed + '.root',
1107  )
1108  else:
1109  yield self.data_collection_task(
1110  num_processes=MasterTask.num_processes,
1111  n_events=self.n_events_training,
1112  experiment_number=self.experiment_number,
1113  random_seed=self.process_type + '_' + self.random_seed,
1114  )
1115 
1116  def output(self):
1117  """
1118  Generate list of output files that the task should produce.
1119  The task is considered finished if and only if the outputs all exist.
1120  """
1121  yield self.add_to_output(self.get_weightfile_xml_identifier())
1122 
1123  def process(self):
1124  """
1125  Use basf2_mva teacher to create MVA weightfile from collected training
1126  data variables.
1127 
1128  This is the main process that is dispatched by the ``run`` method that
1129  is inherited from ``Basf2Task``.
1130  """
1131  if 'USEREC' in self.process_type:
1132  if 'USERECBB' in self.process_type:
1133  process = 'BBBAR'
1134  elif 'USERECEE' in self.process_type:
1135  process = 'BHABHA'
1136  records_files = ['datafiles/qe_records_N' + str(self.n_events_training) +
1137  '_' + process + '_' + self.random_seed + '.root']
1138  else:
1139  if hasattr(self, 'recotrack_option'):
1140  records_files = self.get_input_file_names(
1141  self.data_collection_task.get_records_file_name(
1142  self.data_collection_task,
1143  n_events=self.n_events_training,
1144  random_seed=self.process_type + '_' + self.random_seed,
1145  recotrack_option=self.recotrack_option))
1146  else:
1147  records_files = self.get_input_file_names(
1148  self.data_collection_task.get_records_file_name(
1149  self.data_collection_task,
1150  n_events=self.n_events_training,
1151  random_seed=self.process_type + '_' + self.random_seed))
1152 
1153  my_basf2_mva_teacher(
1154  records_files=records_files,
1155  tree_name=self.tree_name,
1156  weightfile_identifier=self.get_output_file_name(self.get_weightfile_xml_identifier()),
1157  target_variable=self.training_target,
1158  exclude_variables=self.exclude_variables,
1159  fast_bdt_option=self.fast_bdt_option,
1160  )
1161 
1162 
1164  """
1165  Task to run basf2 mva teacher on collected data for VXDTF2 track quality estimator
1166  """
1167 
1168  weightfile_identifier_basename = "vxdtf2_mva_qe"
1169 
1171  tree_name = "tree"
1172 
1173  random_seed = "train_vxd"
1174 
1176  data_collection_task = VXDQEDataCollectionTask
1177 
1178 
1180  """
1181  Task to run basf2 mva teacher on collected data for CDC track quality estimator
1182  """
1183 
1184  weightfile_identifier_basename = "cdc_mva_qe"
1185 
1187  tree_name = "records"
1188 
1189  random_seed = "train_cdc"
1190 
1192  data_collection_task = CDCQEDataCollectionTask
1193 
1194 
1196  """
1197  Task to run basf2 mva teacher on collected data for the final, combined
1198  track quality estimator
1199  """
1200 
1203  recotrack_option = b2luigi.Parameter(default='deleteCDCQI080')
1204 
1205 
1206  weightfile_identifier_basename = "recotrack_mva_qe"
1207 
1209  tree_name = "tree"
1210 
1211  random_seed = "train_rec"
1212 
1214  data_collection_task = RecoTrackQEDataCollectionTask
1215 
1216  cdc_training_target = b2luigi.Parameter()
1217 
1218  def requires(self):
1219  """
1220  Generate list of luigi Tasks that this Task depends on.
1221  """
1222  if 'USEREC' in self.process_type:
1223  if 'USERECBB' in self.process_type:
1224  process = 'BBBAR'
1225  elif 'USERECEE' in self.process_type:
1226  process = 'BHABHA'
1227  yield CheckExistingFile(
1228  filename='datafiles/qe_records_N' + str(self.n_events_training) + '_' + process + '_' + self.random_seed + '.root',
1229  )
1230  else:
1231  yield self.data_collection_task(
1232  cdc_training_target=self.cdc_training_target,
1233  num_processes=MasterTask.num_processes,
1234  n_events=self.n_events_training,
1235  experiment_number=self.experiment_number,
1236  random_seed=self.process_type + '_' + self.random_seed,
1237  recotrack_option=self.recotrack_option,
1238  fast_bdt_option=self.fast_bdt_option,
1239  )
1240 
1241 
1242 class HarvestingValidationBaseTask(Basf2PathTask):
1243  """
1244  Run track reconstruction with MVA quality estimator and write out
1245  (="harvest") a root file with variables useful for the validation.
1246  """
1247 
1248 
1249  n_events_testing = b2luigi.IntParameter()
1250 
1251  n_events_training = b2luigi.IntParameter()
1252 
1253  experiment_number = b2luigi.IntParameter()
1254 
1257  process_type = b2luigi.Parameter(default="BBBAR")
1258 
1260  exclude_variables = b2luigi.ListParameter(hashed=True)
1261 
1262  fast_bdt_option = b2luigi.ListParameter(hashed=True, default=[200, 8, 3, 0.1])
1263 
1264  validation_output_file_name = "harvesting_validation.root"
1265 
1266  reco_output_file_name = "reconstruction.root"
1267 
1268  components = None
1269 
1270  @property
1271  def teacher_task(self) -> TrackQETeacherBaseTask:
1272  """
1273  Teacher task to require to provide a quality estimator weightfile for ``add_tracking_with_quality_estimation``
1274  """
1275  raise NotImplementedError()
1276 
1277  def add_tracking_with_quality_estimation(self, path: basf2.Path) -> None:
1278  """
1279  Add modules for track reconstruction to basf2 path that are to be
1280  validated. Besides track finding it should include MC matching, fitted
1281  track creation and a quality estimator module.
1282  """
1283  raise NotImplementedError()
1284 
1285  def requires(self):
1286  """
1287  Generate list of luigi Tasks that this Task depends on.
1288  """
1289  yield self.teacher_task(
1290  n_events_training=self.n_events_training,
1291  experiment_number=self.experiment_number,
1292  process_type=self.process_type,
1293  exclude_variables=self.exclude_variables,
1294  fast_bdt_option=self.fast_bdt_option,
1295  )
1296  if 'USE' in self.process_type: # USESIM and USEREC
1297  if 'BB' in self.process_type:
1298  process = 'BBBAR'
1299  elif 'EE' in self.process_type:
1300  process = 'BHABHA'
1301  yield CheckExistingFile(
1302  filename='datafiles/generated_mc_N' + str(self.n_events_testing) + '_' + process + '_test.root'
1303  )
1304  else:
1305  yield SplitNMergeSimTask(
1306  bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.experiment_number],
1307  random_seed=self.process_type + '_test',
1308  n_events=self.n_events_testing,
1309  experiment_number=self.experiment_number,
1310  )
1311 
1312  def output(self):
1313  """
1314  Generate list of output files that the task should produce.
1315  The task is considered finished if and only if the outputs all exist.
1316  """
1317  yield self.add_to_output(self.validation_output_file_name)
1318  yield self.add_to_output(self.reco_output_file_name)
1319 
1320  def create_path(self):
1321  """
1322  Create a basf2 path that uses ``add_tracking_with_quality_estimation()``
1323  and adds the ``CombinedTrackingValidationModule`` to write out variables
1324  for validation.
1325  """
1326  # prepare track finding
1327  path = basf2.create_path()
1328  if 'USE' in self.process_type:
1329  if 'BB' in self.process_type:
1330  process = 'BBBAR'
1331  elif 'EE' in self.process_type:
1332  process = 'BHABHA'
1333  inputFileNames = ['datafiles/generated_mc_N' + str(self.n_events_testing) + '_' + process + '_test.root']
1334  else:
1335  inputFileNames = self.get_input_file_names(GenerateSimTask.output_file_name(
1336  GenerateSimTask, n_events=self.n_events_testing, random_seed=self.process_type + '_test'))
1337  path.add_module(
1338  "RootInput",
1339  inputFileNames=inputFileNames,
1340  )
1341  path.add_module("Gearbox")
1342  tracking.add_geometry_modules(path)
1343  tracking.add_hit_preparation_modules(path) # only needed for simulated hits
1344  # add track finding module that needs to be validated
1346  # add modules for validation
1347  path.add_module(
1349  name=None,
1350  contact=None,
1351  expert_level=200,
1352  output_file_name=self.get_output_file_name(
1354  ),
1355  )
1356  )
1357  path.add_module(
1358  "RootOutput",
1359  outputFileName=self.get_output_file_name(self.reco_output_file_name),
1360  )
1361  return path
1362 
1363 
1365  """
1366  Run VXDTF2 track reconstruction and write out (="harvest") a root file with
1367  variables useful for validation of the VXD Quality Estimator.
1368  """
1369 
1370 
1371  validation_output_file_name = "vxd_qe_harvesting_validation.root"
1372 
1373  reco_output_file_name = "vxd_qe_reconstruction.root"
1374 
1375  teacher_task = VXDQETeacherTask
1376 
1378  """
1379  Add modules for VXDTF2 tracking with VXD quality estimator to basf2 path.
1380  """
1381  tracking.add_vxd_track_finding_vxdtf2(
1382  path,
1383  components=["SVD"],
1384  reco_tracks="RecoTracks",
1385  add_mva_quality_indicator=True,
1386  )
1387  # Replace the weightfiles of all quality estimator module by those
1388  # produced in this training by b2luigi
1389  basf2.set_module_parameters(
1390  path,
1391  name="VXDQualityEstimatorMVA",
1392  WeightFileIdentifier=self.get_input_file_names(
1393  self.teacher_task.get_weightfile_xml_identifier(self.teacher_task, fast_bdt_option=self.fast_bdt_option)
1394  )[0],
1395  )
1396  tracking.add_mc_matcher(path, components=["SVD"])
1397  tracking.add_track_fit_and_track_creator(path, components=["SVD"])
1398 
1399 
1401  """
1402  Run CDC reconstruction and write out (="harvest") a root file with variables
1403  useful for validation of the CDC Quality Estimator.
1404  """
1405 
1406  training_target = b2luigi.Parameter()
1407 
1408  validation_output_file_name = "cdc_qe_harvesting_validation.root"
1409 
1410  reco_output_file_name = "cdc_qe_reconstruction.root"
1411 
1412  teacher_task = CDCQETeacherTask
1413 
1414  # overload needed due to specific training target
1415  def requires(self):
1416  """
1417  Generate list of luigi Tasks that this Task depends on.
1418  """
1419  yield self.teacher_task(
1420  n_events_training=self.n_events_training,
1421  experiment_number=self.experiment_number,
1422  process_type=self.process_type,
1423  training_target=self.training_target,
1424  exclude_variables=self.exclude_variables,
1425  fast_bdt_option=self.fast_bdt_option,
1426  )
1427  if 'USE' in self.process_type: # USESIM and USEREC
1428  if 'BB' in self.process_type:
1429  process = 'BBBAR'
1430  elif 'EE' in self.process_type:
1431  process = 'BHABHA'
1432  yield CheckExistingFile(
1433  filename='datafiles/generated_mc_N' + str(self.n_events_testing) + '_' + process + '_test.root'
1434  )
1435  else:
1436  yield SplitNMergeSimTask(
1437  bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.experiment_number],
1438  random_seed=self.process_type + '_test',
1439  n_events=self.n_events_testing,
1440  experiment_number=self.experiment_number,
1441  )
1442 
1444  """
1445  Add modules for CDC standalone tracking with CDC quality estimator to basf2 path.
1446  """
1447  tracking.add_cdc_track_finding(
1448  path,
1449  output_reco_tracks="RecoTracks",
1450  add_mva_quality_indicator=True,
1451  )
1452  # change weightfile of quality estimator to the one produced by this training script
1453  cdc_qe_mva_filter_parameters = {
1454  "identifier": self.get_input_file_names(
1455  CDCQETeacherTask.get_weightfile_xml_identifier(
1456  CDCQETeacherTask,
1457  fast_bdt_option=self.fast_bdt_option))[0]}
1458  basf2.set_module_parameters(
1459  path,
1460  name="TFCDC_TrackQualityEstimator",
1461  filterParameters=cdc_qe_mva_filter_parameters,
1462  )
1463  tracking.add_mc_matcher(path, components=["CDC"])
1464  tracking.add_track_fit_and_track_creator(path, components=["CDC"])
1465 
1466 
1468  """
1469  Run track reconstruction and write out (="harvest") a root file with variables
1470  useful for validation of the MVA track Quality Estimator.
1471  """
1472 
1473  cdc_training_target = b2luigi.Parameter()
1474 
1475  validation_output_file_name = "reco_qe_harvesting_validation.root"
1476 
1477  reco_output_file_name = "reco_qe_reconstruction.root"
1478 
1479  teacher_task = RecoTrackQETeacherTask
1480 
1481  def requires(self):
1482  """
1483  Generate list of luigi Tasks that this Task depends on.
1484  """
1485  yield CDCQETeacherTask(
1486  n_events_training=self.n_events_training,
1487  experiment_number=self.experiment_number,
1488  process_type=self.process_type,
1489  training_target=self.cdc_training_target,
1490  exclude_variables=MasterTask.exclude_variables_cdc,
1491  fast_bdt_option=self.fast_bdt_option,
1492  )
1493  yield VXDQETeacherTask(
1494  n_events_training=self.n_events_training,
1495  experiment_number=self.experiment_number,
1496  process_type=self.process_type,
1497  exclude_variables=MasterTask.exclude_variables_vxd,
1498  fast_bdt_option=self.fast_bdt_option,
1499  )
1500 
1501  yield self.teacher_task(
1502  n_events_training=self.n_events_training,
1503  experiment_number=self.experiment_number,
1504  process_type=self.process_type,
1505  exclude_variables=self.exclude_variables,
1506  cdc_training_target=self.cdc_training_target,
1507  fast_bdt_option=self.fast_bdt_option,
1508  )
1509  if 'USE' in self.process_type: # USESIM and USEREC
1510  if 'BB' in self.process_type:
1511  process = 'BBBAR'
1512  elif 'EE' in self.process_type:
1513  process = 'BHABHA'
1514  yield CheckExistingFile(
1515  filename='datafiles/generated_mc_N' + str(self.n_events_testing) + '_' + process + '_test.root'
1516  )
1517  else:
1518  yield SplitNMergeSimTask(
1519  bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.experiment_number],
1520  random_seed=self.process_type + '_test',
1521  n_events=self.n_events_testing,
1522  experiment_number=self.experiment_number,
1523  )
1524 
1526  """
1527  Add modules for reco tracking with all track quality estimators to basf2 path.
1528  """
1529 
1530  # add tracking recontonstruction with quality estimator modules added
1532  path,
1533  add_cdcTrack_QI=True,
1534  add_vxdTrack_QI=True,
1535  add_recoTrack_QI=True,
1536  skipGeometryAdding=True,
1537  skipHitPreparerAdding=False,
1538  )
1539 
1540  # Replace the weightfiles of all quality estimator modules by those
1541  # produced in the training by b2luigi
1542  cdc_qe_mva_filter_parameters = {
1543  "identifier": self.get_input_file_names(
1544  CDCQETeacherTask.get_weightfile_xml_identifier(
1545  CDCQETeacherTask,
1546  fast_bdt_option=self.fast_bdt_option))[0]}
1547  basf2.set_module_parameters(
1548  path,
1549  name="TFCDC_TrackQualityEstimator",
1550  filterParameters=cdc_qe_mva_filter_parameters,
1551  )
1552  basf2.set_module_parameters(
1553  path,
1554  name="VXDQualityEstimatorMVA",
1555  WeightFileIdentifier=self.get_input_file_names(
1556  VXDQETeacherTask.get_weightfile_xml_identifier(VXDQETeacherTask, fast_bdt_option=self.fast_bdt_option)
1557  )[0],
1558  )
1559  basf2.set_module_parameters(
1560  path,
1561  name="TrackQualityEstimatorMVA",
1562  WeightFileIdentifier=self.get_input_file_names(
1563  RecoTrackQETeacherTask.get_weightfile_xml_identifier(RecoTrackQETeacherTask, fast_bdt_option=self.fast_bdt_option)
1564  )[0],
1565  )
1566 
1567 
1569  """
1570  Base class for evaluating a quality estimator ``basf2_mva_evaluate.py`` on a
1571  separate test data set.
1572 
1573  Evaluation tasks for VXD, CDC and combined QE can inherit from it.
1574  """
1575 
1576 
1581  git_hash = b2luigi.Parameter(default=get_basf2_git_hash())
1582 
1583  n_events_testing = b2luigi.IntParameter()
1584 
1585  n_events_training = b2luigi.IntParameter()
1586 
1587  experiment_number = b2luigi.IntParameter()
1588 
1591  process_type = b2luigi.Parameter(default="BBBAR")
1592 
1593  training_target = b2luigi.Parameter(default="truth")
1594 
1596  exclude_variables = b2luigi.ListParameter(hashed=True)
1597 
1598  fast_bdt_option = b2luigi.ListParameter(hashed=True, default=[200, 8, 3, 0.1])
1599 
1600  @property
1601  def teacher_task(self) -> TrackQETeacherBaseTask:
1602  """
1603  Property defining specific teacher task to require.
1604  """
1605  raise NotImplementedError(
1606  "Evaluation Tasks must define a teacher task to require "
1607  )
1608 
1609  @property
1610  def data_collection_task(self) -> Basf2PathTask:
1611  """
1612  Property defining the specific ``DataCollectionTask`` to require. Must
1613  implemented by the inheriting specific teacher task class.
1614  """
1615  raise NotImplementedError(
1616  "Evaluation Tasks must define a data collection task to require "
1617  )
1618 
1619  @property
1620  def task_acronym(self):
1621  """
1622  Acronym to distinguish between cdc, vxd and rec(o) MVA
1623  """
1624  raise NotImplementedError(
1625  "Evalutation Tasks must define a task acronym."
1626  )
1627 
1628  def requires(self):
1629  """
1630  Generate list of luigi Tasks that this Task depends on.
1631  """
1632  yield self.teacher_task(
1633  n_events_training=self.n_events_training,
1634  experiment_number=self.experiment_number,
1635  process_type=self.process_type,
1636  training_target=self.training_target,
1637  exclude_variables=self.exclude_variables,
1638  fast_bdt_option=self.fast_bdt_option,
1639  )
1640  if 'USEREC' in self.process_type:
1641  if 'USERECBB' in self.process_type:
1642  process = 'BBBAR'
1643  elif 'USERECEE' in self.process_type:
1644  process = 'BHABHA'
1645  yield CheckExistingFile(
1646  filename='datafiles/qe_records_N' + str(self.n_events_testing) + '_' + process + '_test_' +
1647  self.task_acronym + '.root'
1648  )
1649  else:
1650  yield self.data_collection_task(
1651  num_processes=MasterTask.num_processes,
1652  n_events=self.n_events_testing,
1653  experiment_number=self.experiment_number,
1654  random_seed=self.process_type + '_test',
1655  )
1656 
1657  def output(self):
1658  """
1659  Generate list of output files that the task should produce.
1660  The task is considered finished if and only if the outputs all exist.
1661  """
1662  weightfile_details = create_fbdt_option_string(self.fast_bdt_option)
1663  evaluation_pdf_output = self.teacher_task.weightfile_identifier_basename + weightfile_details + ".pdf"
1664  yield self.add_to_output(evaluation_pdf_output)
1665 
1666  @b2luigi.on_temporary_files
1667  def run(self):
1668  """
1669  Run ``basf2_mva_evaluate.py`` subprocess to evaluate QE MVA.
1670 
1671  The MVA weight file created from training on the training data set is
1672  evaluated on separate test data.
1673  """
1674  weightfile_details = create_fbdt_option_string(self.fast_bdt_option)
1675  evaluation_pdf_output_basename = self.teacher_task.weightfile_identifier_basename + weightfile_details + ".pdf"
1676 
1677  evaluation_pdf_output_path = self.get_output_file_name(evaluation_pdf_output_basename)
1678 
1679  if 'USEREC' in self.process_type:
1680  if 'USERECBB' in self.process_type:
1681  process = 'BBBAR'
1682  elif 'USERECEE' in self.process_type:
1683  process = 'BHABHA'
1684  datafiles = 'datafiles/qe_records_N' + str(self.n_events_testing) + '_' + \
1685  process + '_test_' + self.task_acronym + '.root'
1686  else:
1687  datafiles = self.get_input_file_names(
1688  self.data_collection_task.get_records_file_name(
1689  self.data_collection_task,
1690  n_events=self.n_events_testing,
1691  random_seed=self.process + '_test_' +
1692  self.task_acronym))[0]
1693  cmd = [
1694  "basf2_mva_evaluate.py",
1695  "--identifiers",
1696  self.get_input_file_names(
1697  self.teacher_task.get_weightfile_xml_identifier(
1698  self.teacher_task,
1699  fast_bdt_option=self.fast_bdt_option))[0],
1700  "--datafiles",
1701  datafiles,
1702  "--treename",
1703  self.teacher_task.tree_name,
1704  "--outputfile",
1705  evaluation_pdf_output_path,
1706  ]
1707 
1708  # Prepare log files
1709  log_file_dir = get_log_file_dir(self)
1710  # check if directory already exists, if not, create it. I think this is necessary as this task does not
1711  # inherit properly from b2luigi and thus does not do it automatically??
1712  try:
1713  os.makedirs(log_file_dir, exist_ok=True)
1714  # the following should be unnecessary as exist_ok=True should take care that no FileExistError rises. I
1715  # might ask about a permission error...
1716  except FileExistsError:
1717  print('Directory ' + log_file_dir + 'already exists.')
1718  stderr_log_file_path = log_file_dir + "stderr"
1719  stdout_log_file_path = log_file_dir + "stdout"
1720  with open(stdout_log_file_path, "w") as stdout_file:
1721  stdout_file.write("stdout output of the command:\n{}\n\n".format(" ".join(cmd)))
1722  if os.path.exists(stderr_log_file_path):
1723  # remove stderr file if it already exists b/c in the following it will be opened in appending mode
1724  os.remove(stderr_log_file_path)
1725 
1726  # Run evaluation via subprocess and write output into logfiles
1727  with open(stdout_log_file_path, "a") as stdout_file:
1728  with open(stderr_log_file_path, "a") as stderr_file:
1729  try:
1730  subprocess.run(cmd, check=True, stdin=stdout_file, stderr=stderr_file)
1731  except subprocess.CalledProcessError as err:
1732  stderr_file.write(f"Evaluation failed with error:\n{err}")
1733  raise err
1734 
1735 
1737  """
1738  Run ``basf2_mva_evaluate.py`` for the VXD quality estimator on separate test data
1739  """
1740 
1742  teacher_task = VXDQETeacherTask
1743 
1745  data_collection_task = VXDQEDataCollectionTask
1746 
1748  task_acronym = 'vxd'
1749 
1750 
1752  """
1753  Run ``basf2_mva_evaluate.py`` for the CDC quality estimator on separate test data
1754  """
1755 
1757  teacher_task = CDCQETeacherTask
1758 
1760  data_collection_task = CDCQEDataCollectionTask
1761 
1763  task_acronym = 'cdc'
1764 
1765 
1767  """
1768  Run ``basf2_mva_evaluate.py`` for the final, combined quality estimator on
1769  separate test data
1770  """
1771 
1773  teacher_task = RecoTrackQETeacherTask
1774 
1776  data_collection_task = RecoTrackQEDataCollectionTask
1777 
1779  task_acronym = 'rec'
1780 
1781  cdc_training_target = b2luigi.Parameter()
1782 
1783  def requires(self):
1784  """
1785  Generate list of luigi Tasks that this Task depends on.
1786  """
1787  yield self.teacher_task(
1788  n_events_training=self.n_events_training,
1789  experiment_number=self.experiment_number,
1790  process_type=self.process_type,
1791  training_target=self.training_target,
1792  exclude_variables=self.exclude_variables,
1793  cdc_training_target=self.cdc_training_target,
1794  fast_bdt_option=self.fast_bdt_option,
1795  )
1796  if 'USEREC' in self.process_type:
1797  if 'USERECBB' in self.process_type:
1798  process = 'BBBAR'
1799  elif 'USERECEE' in self.process_type:
1800  process = 'BHABHA'
1801  yield CheckExistingFile(
1802  filename='datafiles/qe_records_N' + str(self.n_events_testing) + '_' + process + '_test_' +
1803  self.task_acronym + '.root'
1804  )
1805  else:
1806  yield self.data_collection_task(
1807  num_processes=MasterTask.num_processes,
1808  n_events=self.n_events_testing,
1809  experiment_number=self.experiment_number,
1810  random_seed=self.process_type + "_test",
1811  cdc_training_target=self.cdc_training_target,
1812  )
1813 
1814 
1816  """
1817  Create a PDF file with validation plots for a quality estimator produced
1818  from the ROOT ntuples produced by a harvesting validation task
1819  """
1820 
1821  n_events_testing = b2luigi.IntParameter()
1822 
1823  n_events_training = b2luigi.IntParameter()
1824 
1825  experiment_number = b2luigi.IntParameter()
1826 
1829  process_type = b2luigi.Parameter(default="BBBAR")
1830 
1832  exclude_variables = b2luigi.ListParameter(hashed=True)
1833 
1834  fast_bdt_option = b2luigi.ListParameter(hashed=True, default=[200, 8, 3, 0.1])
1835 
1836  primaries_only = b2luigi.BoolParameter(default=True) # normalize finding efficiencies to primary MC-tracks
1837 
1838  @property
1839  def harvesting_validation_task_instance(self) -> HarvestingValidationBaseTask:
1840  """
1841  Specifies related harvesting validation task which produces the ROOT
1842  files with the data that is plotted by this task.
1843  """
1844  raise NotImplementedError("Must define a QI harvesting validation task for which to do the plots")
1845 
1846  @property
1848  """
1849  Name of the output PDF file containing the validation plots
1850  """
1851  validation_harvest_basename = self.harvesting_validation_task_instance.validation_output_file_name
1852  return validation_harvest_basename.replace(".root", "_plots.pdf")
1853 
1854  def requires(self):
1855  """
1856  Generate list of luigi Tasks that this Task depends on.
1857  """
1859 
1860  def output(self):
1861  """
1862  Generate list of output files that the task should produce.
1863  The task is considered finished if and only if the outputs all exist.
1864  """
1865  yield self.add_to_output(self.output_pdf_file_basename)
1866 
1867  @b2luigi.on_temporary_files
1868  def process(self):
1869  """
1870  Use basf2_mva teacher to create MVA weightfile from collected training
1871  data variables.
1872 
1873  Main process that is dispatched by the ``run`` method that is inherited
1874  from ``Basf2Task``.
1875  """
1876  # get the validation "harvest", which is the ROOT file with ntuples for validation
1877  validation_harvest_basename = self.harvesting_validation_task_instance.validation_output_file_name
1878  validation_harvest_path = self.get_input_file_names(validation_harvest_basename)[0]
1879 
1880  # Load "harvested" validation data from root files into dataframes (requires enough memory to hold data)
1881  pr_columns = [ # Restrict memory usage by only reading in columns that are used in the steering file
1882  'is_fake', 'is_clone', 'is_matched', 'quality_indicator',
1883  'experiment_number', 'run_number', 'event_number', 'pr_store_array_number',
1884  'pt_estimate', 'z0_estimate', 'd0_estimate', 'tan_lambda_estimate',
1885  'phi0_estimate', 'pt_truth', 'z0_truth', 'd0_truth', 'tan_lambda_truth',
1886  'phi0_truth',
1887  ]
1888  # In ``pr_df`` each row corresponds to a track from Pattern Recognition
1889  pr_df = root_pandas.read_root(validation_harvest_path, key='pr_tree/pr_tree', columns=pr_columns)
1890  mc_columns = [ # restrict mc_df to these columns
1891  'experiment_number',
1892  'run_number',
1893  'event_number',
1894  'pr_store_array_number',
1895  'is_missing',
1896  'is_primary',
1897  ]
1898  # In ``mc_df`` each row corresponds to an MC track
1899  mc_df = root_pandas.read_root(validation_harvest_path, key='mc_tree/mc_tree', columns=mc_columns)
1900  if self.primaries_only:
1901  mc_df = mc_df[mc_df.is_primary.eq(True)]
1902 
1903  # Define QI thresholds for the FOM plots and the ROC curves
1904  qi_cuts = np.linspace(0., 1, 20, endpoint=False)
1905  # # Add more points at the very end between the previous maximum and 1
1906  # qi_cuts = np.append(qi_cuts, np.linspace(np.max(qi_cuts), 1, 20, endpoint=False))
1907 
1908  # Create plots and append them to single output pdf
1909 
1910  output_pdf_file_path = self.get_output_file_name(self.output_pdf_file_basename)
1911  with PdfPages(output_pdf_file_path, keep_empty=False) as pdf:
1912 
1913  # Add a title page to validation plot PDF with some metadata
1914  # Remember that most metadata is in the xml file of the weightfile
1915  # and in the b2luigi directory structure
1916  titlepage_fig, titlepage_ax = plt.subplots()
1917  titlepage_ax.axis("off")
1918  title = f"Quality Estimator validation plots from {self.__class__.__name__}"
1919  titlepage_ax.set_title(title)
1920  teacher_task = self.harvesting_validation_task_instance.teacher_task
1921  weightfile_identifier = teacher_task.get_weightfile_xml_identifier(teacher_task, fast_bdt_option=self.fast_bdt_option)
1922  meta_data = {
1923  "Date": datetime.today().strftime("%Y-%m-%d %H:%M"),
1924  "Created by steering file": os.path.realpath(__file__),
1925  "Created from data in": validation_harvest_path,
1926  "Background directory": MasterTask.bkgfiles_by_exp[self.experiment_number],
1927  "weight file": weightfile_identifier,
1928  }
1929  if hasattr(self, 'exclude_variables'):
1930  meta_data["Excluded variables"] = ", ".join(self.exclude_variables)
1931  meta_data_string = (format_dictionary(meta_data) +
1932  "\n\n(For all MVA training parameters look into the produced weight file)")
1933  luigi_params = get_serialized_parameters(self)
1934  luigi_param_string = (f"\n\nb2luigi parameters for {self.__class__.__name__}\n" +
1935  format_dictionary(luigi_params))
1936  title_page_text = meta_data_string + luigi_param_string
1937  titlepage_ax.text(0, 1, title_page_text, ha="left", va="top", wrap=True, fontsize=8)
1938  pdf.savefig(titlepage_fig)
1939  plt.close(titlepage_fig)
1940 
1941  fake_rates = get_uncertain_means_for_qi_cuts(pr_df, "is_fake", qi_cuts)
1942  fake_fig, fake_ax = plt.subplots()
1943  fake_ax.set_title("Fake rate")
1944  plot_with_errobands(fake_rates, ax=fake_ax)
1945  fake_ax.set_ylabel("fake rate")
1946  fake_ax.set_xlabel("quality indicator requirement")
1947  pdf.savefig(fake_fig, bbox_inches="tight")
1948  plt.close(fake_fig)
1949 
1950  # Plot clone rates
1951  clone_rates = get_uncertain_means_for_qi_cuts(pr_df, "is_clone", qi_cuts)
1952  clone_fig, clone_ax = plt.subplots()
1953  clone_ax.set_title("Clone rate")
1954  plot_with_errobands(clone_rates, ax=clone_ax)
1955  clone_ax.set_ylabel("clone rate")
1956  clone_ax.set_xlabel("quality indicator requirement")
1957  pdf.savefig(clone_fig, bbox_inches="tight")
1958  plt.close(clone_fig)
1959 
1960  # Plot finding efficieny
1961 
1962  # The Quality Indicator is only avaiable in pr_tree and thus the
1963  # PR-track dataframe. To get the QI of the related PR track for an MC
1964  # track, merge the PR dataframe into the MC dataframe
1965  pr_track_identifiers = ['experiment_number', 'run_number', 'event_number', 'pr_store_array_number']
1966  mc_df = upd.merge(
1967  left=mc_df, right=pr_df[pr_track_identifiers + ['quality_indicator']],
1968  how='left',
1969  on=pr_track_identifiers
1970  )
1971 
1972  missing_fractions = (
1973  _my_uncertain_mean(mc_df[
1974  mc_df.quality_indicator.isnull() | (mc_df.quality_indicator > qi_cut)]['is_missing'])
1975  for qi_cut in qi_cuts
1976  )
1977 
1978  findeff_fig, findeff_ax = plt.subplots()
1979  findeff_ax.set_title("Finding efficiency")
1980  finding_efficiencies = 1.0 - upd.Series(data=missing_fractions, index=qi_cuts)
1981  plot_with_errobands(finding_efficiencies, ax=findeff_ax)
1982  findeff_ax.set_ylabel("finding efficiency")
1983  findeff_ax.set_xlabel("quality indicator requirement")
1984  pdf.savefig(findeff_fig, bbox_inches="tight")
1985  plt.close(findeff_fig)
1986 
1987  # Plot ROC curves
1988 
1989  # Fake rate vs. finding efficiency ROC curve
1990  fake_roc_fig, fake_roc_ax = plt.subplots()
1991  fake_roc_ax.set_title("Fake rate vs. finding efficiency ROC curve")
1992  fake_roc_ax.errorbar(x=finding_efficiencies.nominal_value, y=fake_rates.nominal_value,
1993  xerr=finding_efficiencies.std_dev, yerr=fake_rates.std_dev, elinewidth=0.8)
1994  fake_roc_ax.set_xlabel('finding efficiency')
1995  fake_roc_ax.set_ylabel('fake rate')
1996  pdf.savefig(fake_roc_fig, bbox_inches="tight")
1997  plt.close(fake_roc_fig)
1998 
1999  # Clone rate vs. finding efficiency ROC curve
2000  clone_roc_fig, clone_roc_ax = plt.subplots()
2001  clone_roc_ax.set_title("Clone rate vs. finding efficiency ROC curve")
2002  clone_roc_ax.errorbar(x=finding_efficiencies.nominal_value, y=clone_rates.nominal_value,
2003  xerr=finding_efficiencies.std_dev, yerr=clone_rates.std_dev, elinewidth=0.8)
2004  clone_roc_ax.set_xlabel('finding efficiency')
2005  clone_roc_ax.set_ylabel('clone rate')
2006  pdf.savefig(clone_roc_fig, bbox_inches="tight")
2007  plt.close(clone_roc_fig)
2008 
2009  # Plot kinematic distributions
2010 
2011  # use fewer qi cuts as each cut will be it's own subplot now and not a point
2012  kinematic_qi_cuts = [0, 0.5, 0.9]
2013 
2014  # Define kinematic parameters which we want to histogram and define
2015  # dictionaries relating them to latex labels, units and binnings
2016  params = ['d0', 'z0', 'pt', 'tan_lambda', 'phi0']
2017  label_by_param = {
2018  "pt": "$p_T$",
2019  "z0": "$z_0$",
2020  "d0": "$d_0$",
2021  "tan_lambda": r"$\tan{\lambda}$",
2022  "phi0": r"$\phi_0$"
2023  }
2024  unit_by_param = {
2025  "pt": "GeV",
2026  "z0": "cm",
2027  "d0": "cm",
2028  "tan_lambda": "rad",
2029  "phi0": "rad"
2030  }
2031  n_kinematic_bins = 75 # number of bins per kinematic variable
2032  bins_by_param = {
2033  "pt": np.linspace(0, np.percentile(pr_df['pt_truth'].dropna(), 95), n_kinematic_bins),
2034  "z0": np.linspace(-0.1, 0.1, n_kinematic_bins),
2035  "d0": np.linspace(0, 0.01, n_kinematic_bins),
2036  "tan_lambda": np.linspace(-2, 3, n_kinematic_bins),
2037  "phi0": np.linspace(0, 2 * np.pi, n_kinematic_bins)
2038  }
2039 
2040  # Iterate over each parameter and for each make stacked histograms for different QI cuts
2041  kinematic_qi_cuts = [0, 0.5, 0.8]
2042  blue, yellow, green = plt.get_cmap("tab10").colors[0:3]
2043  for param in params:
2044  fig, axarr = plt.subplots(ncols=len(kinematic_qi_cuts), sharey=True, sharex=True, figsize=(14, 6))
2045  fig.suptitle(f"{label_by_param[param]} distributions")
2046  for i, qi in enumerate(kinematic_qi_cuts):
2047  ax = axarr[i]
2048  ax.set_title(f"QI > {qi}")
2049  incut = pr_df[(pr_df['quality_indicator'] > qi)]
2050  incut_matched = incut[incut.is_matched.eq(True)]
2051  incut_clones = incut[incut.is_clone.eq(True)]
2052  incut_fake = incut[incut.is_fake.eq(True)]
2053 
2054  # if any series is empty, break ouf loop and don't draw try to draw a stacked histogram
2055  if any(series.empty for series in (incut, incut_matched, incut_clones, incut_fake)):
2056  ax.text(0.5, 0.5, "Not enough data in bin", ha="center", va="center", transform=ax.transAxes)
2057  continue
2058 
2059  bins = bins_by_param[param]
2060  stacked_histogram_series_tuple = (
2061  incut_matched[f'{param}_estimate'],
2062  incut_clones[f'{param}_estimate'],
2063  incut_fake[f'{param}_estimate'],
2064  )
2065  histvals, _, _ = ax.hist(stacked_histogram_series_tuple,
2066  stacked=True,
2067  bins=bins, range=(bins.min(), bins.max()),
2068  color=(blue, green, yellow),
2069  label=("matched", "clones", "fakes"))
2070  ax.set_xlabel(f'{label_by_param[param]} estimate / ({unit_by_param[param]})')
2071  ax.set_ylabel('# tracks')
2072  axarr[0].legend(loc="upper center", bbox_to_anchor=(0, -0.15))
2073  pdf.savefig(fig, bbox_inches="tight")
2074  plt.close(fig)
2075 
2076 
2078  """
2079  Create a PDF file with validation plots for the VXDTF2 track quality
2080  estimator produced from the ROOT ntuples produced by a VXDTF2 track QE
2081  harvesting validation task
2082  """
2083 
2084  @property
2086  """
2087  Harvesting validation task to require, which produces the ROOT files
2088  with variables to produce the VXD QE validation plots.
2089  """
2091  n_events_testing=self.n_events_testing,
2092  n_events_training=self.n_events_training,
2093  process_type=self.process_type,
2094  experiment_number=self.experiment_number,
2095  exclude_variables=self.exclude_variables,
2096  num_processes=MasterTask.num_processes,
2097  fast_bdt_option=self.fast_bdt_option,
2098  )
2099 
2100 
2102  """
2103  Create a PDF file with validation plots for the CDC track quality estimator
2104  produced from the ROOT ntuples produced by a CDC track QE harvesting
2105  validation task
2106  """
2107 
2108  training_target = b2luigi.Parameter()
2109 
2110  @property
2112  """
2113  Harvesting validation task to require, which produces the ROOT files
2114  with variables to produce the CDC QE validation plots.
2115  """
2117  n_events_testing=self.n_events_testing,
2118  n_events_training=self.n_events_training,
2119  process_type=self.process_type,
2120  experiment_number=self.experiment_number,
2121  training_target=self.training_target,
2122  exclude_variables=self.exclude_variables,
2123  num_processes=MasterTask.num_processes,
2124  fast_bdt_option=self.fast_bdt_option,
2125  )
2126 
2127 
2129  """
2130  Create a PDF file with validation plots for the reco MVA track quality
2131  estimator produced from the ROOT ntuples produced by a reco track QE
2132  harvesting validation task
2133  """
2134 
2135  cdc_training_target = b2luigi.Parameter()
2136 
2137  @property
2139  """
2140  Harvesting validation task to require, which produces the ROOT files
2141  with variables to produce the final MVA track QE validation plots.
2142  """
2144  n_events_testing=self.n_events_testing,
2145  n_events_training=self.n_events_training,
2146  process_type=self.process_type,
2147  experiment_number=self.experiment_number,
2148  cdc_training_target=self.cdc_training_target,
2149  exclude_variables=self.exclude_variables,
2150  num_processes=MasterTask.num_processes,
2151  fast_bdt_option=self.fast_bdt_option,
2152  )
2153 
2154 
2156  """
2157  Collect weightfile identifiers from different teacher tasks and merge them
2158  into a local database for testing.
2159  """
2160 
2161  n_events_training = b2luigi.IntParameter()
2162 
2163  experiment_number = b2luigi.IntParameter()
2164 
2167  process_type = b2luigi.Parameter(default="BBBAR")
2168 
2169  cdc_training_target = b2luigi.Parameter()
2170 
2171  fast_bdt_option = b2luigi.ListParameter(hashed=True, default=[200, 8, 3, 0.1])
2172 
2173  def requires(self):
2174  """
2175  Required teacher tasks
2176  """
2177  yield VXDQETeacherTask(
2178  n_events_training=self.n_events_training,
2179  process_type=self.process_type,
2180  experiment_number=self.experiment_number,
2181  exclude_variables=MasterTask.exclude_variables_vxd,
2182  fast_bdt_option=self.fast_bdt_option,
2183  )
2184  yield CDCQETeacherTask(
2185  n_events_training=self.n_events_training,
2186  process_type=self.process_type,
2187  experiment_number=self.experiment_number,
2188  training_target=self.cdc_training_target,
2189  exclude_variables=MasterTask.exclude_variables_cdc,
2190  fast_bdt_option=self.fast_bdt_option,
2191  )
2192  yield RecoTrackQETeacherTask(
2193  n_events_training=self.n_events_training,
2194  process_type=self.process_type,
2195  experiment_number=self.experiment_number,
2196  cdc_training_target=self.cdc_training_target,
2197  exclude_variables=MasterTask.exclude_variables_rec,
2198  fast_bdt_option=self.fast_bdt_option,
2199  )
2200 
2201  def output(self):
2202  """
2203  Local database
2204  """
2205  yield self.add_to_output("localdb.tar")
2206 
2207  def process(self):
2208  """
2209  Create local database
2210  """
2211  current_path = Path.cwd()
2212  localdb_archive_path = Path(self.get_output_file_name("localdb.tar")).absolute()
2213  output_dir = localdb_archive_path.parent
2214 
2215  # remove existing local databases in output directories
2216  self._clean()
2217  # "Upload" the weightfiles of all 3 teacher tasks into the same localdb
2218  for task in (VXDQETeacherTask, CDCQETeacherTask, RecoTrackQETeacherTask):
2219  # Extract xml identifier input file name before switching working directories, as it returns relative paths
2220  weightfile_xml_identifier_path = os.path.abspath(self.get_input_file_names(
2221  task.get_weightfile_xml_identifier(task, fast_bdt_option=self.fast_bdt_option))[0])
2222  # As localdb is created in working directory, chdir into desired output path
2223  try:
2224  os.chdir(output_dir)
2225  # Same as basf2_mva_upload on the command line, creates localdb directory in current working dir
2226  basf2_mva.upload(
2227  weightfile_xml_identifier_path,
2228  task.weightfile_identifier_basename,
2229  self.experiment_number, 0,
2230  self.experiment_number, -1,
2231  )
2232  finally: # Switch back to working directory of b2luigi, even if upload failed
2233  os.chdir(current_path)
2234 
2235  # Pack localdb into tar archive, so that we can have on single output file instead
2236  shutil.make_archive(
2237  base_name=localdb_archive_path.as_posix().split('.')[0],
2238  format="tar",
2239  root_dir=output_dir,
2240  base_dir="localdb",
2241  verbose=True,
2242  )
2243 
2244  def _clean(self):
2245  """
2246  Remove local database and tar archives in output directory
2247  """
2248  localdb_archive_path = Path(self.get_output_file_name("localdb.tar"))
2249  localdb_path = localdb_archive_path.parent / "localdb"
2250 
2251  if localdb_path.exists():
2252  print(f"Deleting localdb\n{localdb_path}\nwith contents\n ",
2253  "\n ".join(f.name for f in localdb_path.iterdir()))
2254  shutil.rmtree(localdb_path, ignore_errors=False) # recursively delete localdb
2255 
2256  if localdb_archive_path.is_file():
2257  print(f"Deleting {localdb_archive_path}")
2258  os.remove(localdb_archive_path)
2259 
2260  def on_failure(self, exception):
2261  """
2262  Cleanup: Remove local database to prevent existing outputs when task did not finish successfully
2263  """
2264  self._clean()
2265  # Run existing on_failure from parent class
2266  super().on_failure(exception)
2267 
2268 
2269 class MasterTask(b2luigi.WrapperTask):
2270  """
2271  Wrapper task that needs to finish for b2luigi to finish running this steering file.
2272 
2273  It is done if the outputs of all required subtasks exist. It is thus at the
2274  top of the luigi task graph. Edit the ``requires`` method to steer which
2275  tasks and with which parameters you want to run.
2276  """
2277 
2280  process_type = b2luigi.get_setting("process_type", default='BBBAR')
2281 
2282  n_events_training = b2luigi.get_setting("n_events_training", default=20000)
2283 
2284  n_events_testing = b2luigi.get_setting("n_events_testing", default=5000)
2285 
2286  n_events_per_task = b2luigi.get_setting("n_events_per_task", default=100)
2287 
2288  num_processes = b2luigi.get_setting("basf2_processes_per_worker", default=0)
2289  datafiles = b2luigi.get_setting("datafiles")
2290 
2291  bkgfiles_by_exp = b2luigi.get_setting("bkgfiles_by_exp")
2292 
2293  bkgfiles_by_exp = {int(key): val for (key, val) in bkgfiles_by_exp.items()}
2294 
2295  exclude_variables_cdc = [
2296  "has_matching_segment",
2297  "n_cdc_hits",
2298  "avg_hit_dist",
2299  "drift_length_sum",
2300  "drift_length_mean",
2301  "drift_length_variance",
2302  "drift_length_max",
2303  "drift_length_min",
2304  "norm_drift_length_sum",
2305  "norm_drift_length_max",
2306  "norm_drift_length_min",
2307  "norm_drift_length_mean",
2308  "norm_drift_length_variance",
2309  "tot_sum",
2310  "tot_mean",
2311  "tot_variance",
2312  "tot_max",
2313  "tot_min",
2314  "adc_variance",
2315  "adc_max",
2316  "adc_mean",
2317  "adc_min",
2318  "adc_sum",
2319  "empty_s_mean",
2320  "empty_s_variance",
2321  "empty_s_max",
2322  "cont_layer_first_vs_min",
2323  "cont_layer_max_vs_last",
2324  "cont_layer_occupancy",
2325  "cont_layer_mean",
2326  "cont_layer_min",
2327  "cont_layer_max",
2328  "cont_layer_first",
2329  "cont_layer_last",
2330  "cont_layer_variance",
2331  "cont_layer_count",
2332  "super_layer_first_vs_min",
2333  "super_layer_max_vs_last",
2334  "super_layer_occupancy",
2335  "super_layer_mean",
2336  "super_layer_variance"]
2337 
2338  exclude_variables_vxd = [
2339  'energyLoss_max', 'energyLoss_min', 'energyLoss_mean', 'energyLoss_std', 'energyLoss_sum',
2340  'size_max', 'size_min', 'size_mean', 'size_std', 'size_sum',
2341  'seedCharge_max', 'seedCharge_min', 'seedCharge_mean', 'seedCharge_std', 'seedCharge_sum',
2342  'tripletFit_P_Mag', 'tripletFit_P_Eta', 'tripletFit_P_Phi', 'tripletFit_P_X', 'tripletFit_P_Y', 'tripletFit_P_Z']
2343 
2344  exclude_variables_rec = [
2345  'N_RecoTracks',
2346  'N_PXDRecoTracks',
2347  'N_SVDRecoTracks',
2348  'N_CDCRecoTracks',
2349  'N_diff_PXD_SVD_RecoTracks',
2350  'N_diff_SVD_CDC_RecoTracks',
2351  'Fit_Successful',
2352  'Fit_NFailedPoints',
2353  'Fit_Chi2',
2354  'N_TrackPoints_without_KalmanFitterInfo',
2355  'N_Hits_without_TrackPoint',
2356  'SVD_CDC_CDCwall_Chi2',
2357  'SVD_CDC_CDCwall_Pos_diff_Z',
2358  'SVD_CDC_CDCwall_Pos_diff_Pt',
2359  'SVD_CDC_CDCwall_Pos_diff_Theta',
2360  'SVD_CDC_CDCwall_Pos_diff_Phi',
2361  'SVD_CDC_CDCwall_Pos_diff_Mag',
2362  'SVD_CDC_CDCwall_Pos_diff_Eta',
2363  'SVD_CDC_CDCwall_Mom_diff_Z',
2364  'SVD_CDC_CDCwall_Mom_diff_Pt',
2365  'SVD_CDC_CDCwall_Mom_diff_Theta',
2366  'SVD_CDC_CDCwall_Mom_diff_Phi',
2367  'SVD_CDC_CDCwall_Mom_diff_Mag',
2368  'SVD_CDC_CDCwall_Mom_diff_Eta',
2369  'SVD_CDC_POCA_Pos_diff_Z',
2370  'SVD_CDC_POCA_Pos_diff_Pt',
2371  'SVD_CDC_POCA_Pos_diff_Theta',
2372  'SVD_CDC_POCA_Pos_diff_Phi',
2373  'SVD_CDC_POCA_Pos_diff_Mag',
2374  'SVD_CDC_POCA_Pos_diff_Eta',
2375  'SVD_CDC_POCA_Mom_diff_Z',
2376  'SVD_CDC_POCA_Mom_diff_Pt',
2377  'SVD_CDC_POCA_Mom_diff_Theta',
2378  'SVD_CDC_POCA_Mom_diff_Phi',
2379  'SVD_CDC_POCA_Mom_diff_Mag',
2380  'SVD_CDC_POCA_Mom_diff_Eta',
2381  'POCA_Pos_Pt',
2382  'POCA_Pos_Mag',
2383  'POCA_Pos_Phi',
2384  'POCA_Pos_Z',
2385  'POCA_Pos_Theta',
2386  'PXD_QI',
2387  'SVD_FitSuccessful',
2388  'CDC_FitSuccessful',
2389  'seed_Charge',
2390  'Fit_Charge',
2391  'weight_max',
2392  'weight_min',
2393  'weight_mean',
2394  'weight_std',
2395  'weight_median',
2396  'weight_n_zeros',
2397  'weight_firstCDCHit',
2398  'weight_lastSVDHit',
2399  'smoothedChi2_max',
2400  'smoothedChi2_min',
2401  'smoothedChi2_mean',
2402  'smoothedChi2_std',
2403  'smoothedChi2_median',
2404  'smoothedChi2_n_zeros',
2405  'smoothedChi2_firstCDCHit',
2406  'smoothedChi2_lastSVDHit']
2407 
2408  def requires(self):
2409  """
2410  Generate list of tasks that needs to be done for luigi to finish running
2411  this steering file.
2412  """
2413  cdc_training_targets = [
2414  "truth", # treats clones as backround, only best matched CDC tracks are true
2415  # "truth_track_is_matched" # treats clones as signal
2416  ]
2417 
2418  fast_bdt_options = []
2419  # possible to run over a chosen hyperparameter space if wanted
2420  # for i in range(250, 400, 50):
2421  # for j in range(6, 10, 2):
2422  # for k in range(2, 6):
2423  # for l in range(0, 5):
2424  # fast_bdt_options.append([100 + i, j, 3+k, 0.025+l*0.025])
2425  # fast_bdt_options.append([200, 8, 3, 0.1]) # default FastBDT option
2426  fast_bdt_options.append([350, 6, 5, 0.1])
2427 
2428  experiment_numbers = b2luigi.get_setting("experiment_numbers")
2429 
2430  # iterate over all possible combinations of parameters from the above defined parameter lists
2431  for experiment_number, cdc_training_target, fast_bdt_option in itertools.product(
2432  experiment_numbers, cdc_training_targets, fast_bdt_options
2433  ):
2434  # if test_selected_task is activated, only run the following tasks:
2435  if b2luigi.get_setting("test_selected_task", default=False):
2437  num_processes=self.num_processes,
2438  n_events=self.n_events_testing,
2439  experiment_number=experiment_number,
2440  random_seed=self.process_type + '_test',
2441  )
2442  yield CDCQETeacherTask(
2443  n_events_training=self.n_events_training,
2444  process_type=self.process_type,
2445  experiment_number=experiment_number,
2446  exclude_variables=self.exclude_variables_cdc,
2447  training_target=cdc_training_target,
2448  fast_bdt_option=fast_bdt_option,
2449  )
2450  else:
2451  # if data shall be processed, it can neither be trained nor evaluated
2452  if 'DATA' in self.process_type:
2454  num_processes=self.num_processes,
2455  n_events=self.n_events_testing,
2456  experiment_number=experiment_number,
2457  random_seed=self.process_type + '_test',
2458  )
2460  num_processes=self.num_processes,
2461  n_events=self.n_events_testing,
2462  experiment_number=experiment_number,
2463  random_seed=self.process_type + '_test',
2464  )
2466  num_processes=self.num_processes,
2467  n_events=self.n_events_testing,
2468  experiment_number=experiment_number,
2469  random_seed=self.process_type + '_test',
2470  recotrack_option='deleteCDCQI080',
2471  cdc_training_target=cdc_training_target,
2472  fast_bdt_option=fast_bdt_option,
2473  )
2474  else:
2476  n_events_training=self.n_events_training,
2477  process_type=self.process_type,
2478  experiment_number=experiment_number,
2479  cdc_training_target=cdc_training_target,
2480  fast_bdt_option=fast_bdt_option,
2481  )
2482 
2483  if b2luigi.get_setting("run_validation_tasks", default=True):
2485  n_events_training=self.n_events_training,
2486  n_events_testing=self.n_events_testing,
2487  process_type=self.process_type,
2488  experiment_number=experiment_number,
2489  cdc_training_target=cdc_training_target,
2490  exclude_variables=self.exclude_variables_rec,
2491  fast_bdt_option=fast_bdt_option,
2492  )
2494  n_events_training=self.n_events_training,
2495  n_events_testing=self.n_events_testing,
2496  process_type=self.process_type,
2497  experiment_number=experiment_number,
2498  exclude_variables=self.exclude_variables_cdc,
2499  training_target=cdc_training_target,
2500  fast_bdt_option=fast_bdt_option,
2501  )
2503  n_events_training=self.n_events_training,
2504  n_events_testing=self.n_events_testing,
2505  process_type=self.process_type,
2506  exclude_variables=self.exclude_variables_vxd,
2507  experiment_number=experiment_number,
2508  fast_bdt_option=fast_bdt_option,
2509  )
2510 
2511  if b2luigi.get_setting("run_mva_evaluate", default=True):
2512  # Evaluate trained weightfiles via basf2_mva_evaluate.py on separate testdatasets
2513  # requires a latex installation to work
2515  n_events_training=self.n_events_training,
2516  n_events_testing=self.n_events_testing,
2517  process_type=self.process_type,
2518  experiment_number=experiment_number,
2519  cdc_training_target=cdc_training_target,
2520  exclude_variables=self.exclude_variables_rec,
2521  fast_bdt_option=fast_bdt_option,
2522  )
2524  n_events_training=self.n_events_training,
2525  n_events_testing=self.n_events_testing,
2526  process_type=self.process_type,
2527  experiment_number=experiment_number,
2528  exclude_variables=self.exclude_variables_cdc,
2529  fast_bdt_option=fast_bdt_option,
2530  training_target=cdc_training_target,
2531  )
2533  n_events_training=self.n_events_training,
2534  n_events_testing=self.n_events_testing,
2535  process_type=self.process_type,
2536  experiment_number=experiment_number,
2537  exclude_variables=self.exclude_variables_vxd,
2538  fast_bdt_option=fast_bdt_option,
2539  )
2540 
2541 
2542 if __name__ == "__main__":
2543  # if global tags are specified in the settings, use them:
2544  globaltags = b2luigi.get_setting("globaltags", default=[])
2545  if len(globaltags) > 0:
2546  basf2.conditions.reset()
2547  for gt in globaltags:
2548  basf2.conditions.prepend_globaltag(gt)
2549  workers = b2luigi.get_setting("workers", default=1)
2550  b2luigi.process(MasterTask(), workers=workers)
combined_quality_estimator_teacher.RecoTrackQETeacherTask.cdc_training_target
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
Definition: combined_quality_estimator_teacher.py:1216
combined_quality_estimator_teacher.TrackQETeacherBaseTask.exclude_variables
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
Definition: combined_quality_estimator_teacher.py:1037
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.data_collection_task
Basf2PathTask data_collection_task(self)
Definition: combined_quality_estimator_teacher.py:1610
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1854
combined_quality_estimator_teacher.SplitNMergeSimTask.n_events
n_events
Number of events to generate.
Definition: combined_quality_estimator_teacher.py:481
combined_quality_estimator_teacher.TrackQETeacherBaseTask.n_events_training
n_events_training
Number of events to generate for the training data set.
Definition: combined_quality_estimator_teacher.py:1026
combined_quality_estimator_teacher.RecoTrackQEValidationPlotsTask
Definition: combined_quality_estimator_teacher.py:2128
combined_quality_estimator_teacher.MasterTask.n_events_training
n_events_training
Number of events to generate for the training data set.
Definition: combined_quality_estimator_teacher.py:2282
combined_quality_estimator_teacher.RecoTrackQETeacherTask.random_seed
string random_seed
Random basf2 seed used to create the training data set.
Definition: combined_quality_estimator_teacher.py:1211
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1628
combined_quality_estimator_teacher.TrackQETeacherBaseTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:1116
combined_quality_estimator_teacher.VXDQEDataCollectionTask.n_events
n_events
Number of events to generate.
Definition: combined_quality_estimator_teacher.py:566
combined_quality_estimator_teacher.CDCQEDataCollectionTask.get_records_file_name
def get_records_file_name(self, n_events=None, random_seed=None)
Filename of the recorded/collected data for the final QE MVA training.
Definition: combined_quality_estimator_teacher.py:703
combined_quality_estimator_teacher.TrackQETeacherBaseTask.random_seed
def random_seed(self)
Definition: combined_quality_estimator_teacher.py:1078
combined_quality_estimator_teacher.GenerateSimTask.n_events
n_events
Number of events to generate.
Definition: combined_quality_estimator_teacher.py:364
combined_quality_estimator_teacher.HarvestingValidationBaseTask.n_events_training
n_events_training
Number of events to generate for the training data set.
Definition: combined_quality_estimator_teacher.py:1251
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.n_events_training
n_events_training
Number of events to generate for the training data set.
Definition: combined_quality_estimator_teacher.py:1823
combined_quality_estimator_teacher.TrackQETeacherBaseTask.tree_name
def tree_name(self)
Definition: combined_quality_estimator_teacher.py:1069
combined_quality_estimator_teacher.HarvestingValidationBaseTask
Definition: combined_quality_estimator_teacher.py:1242
combined_quality_estimator_teacher.VXDTrackQEEvaluationTask
Definition: combined_quality_estimator_teacher.py:1736
combined_quality_estimator_teacher.VXDQETeacherTask
Definition: combined_quality_estimator_teacher.py:1163
combined_quality_estimator_teacher.TrackQETeacherBaseTask
Definition: combined_quality_estimator_teacher.py:1014
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.fast_bdt_option
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
Definition: combined_quality_estimator_teacher.py:821
combined_quality_estimator_teacher.CDCQEValidationPlotsTask
Definition: combined_quality_estimator_teacher.py:2101
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.process
def process(self)
Definition: combined_quality_estimator_teacher.py:1868
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:1860
combined_quality_estimator_teacher.SplitNMergeSimTask.output_file_name
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
Definition: combined_quality_estimator_teacher.py:492
combined_quality_estimator_teacher.CDCQEDataCollectionTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:696
combined_quality_estimator_teacher.SplitNMergeSimTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:483
combined_quality_estimator_teacher.TrackQETeacherBaseTask.training_target
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
Definition: combined_quality_estimator_teacher.py:1034
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:861
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.get_records_file_name
def get_records_file_name(self, n_events=None, random_seed=None, recotrack_option=None)
Filename of the recorded/collected data for the final QE MVA training.
Definition: combined_quality_estimator_teacher.py:825
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.get_input_files
def get_input_files(self, n_events=None, random_seed=None)
Definition: combined_quality_estimator_teacher.py:843
combined_quality_estimator_teacher.RecoTrackQEValidationPlotsTask.cdc_training_target
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
Definition: combined_quality_estimator_teacher.py:2135
simulation.add_simulation
def add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=False, usePXDGatedMode=False)
Definition: simulation.py:114
tracking.add_tracking_reconstruction
def add_tracking_reconstruction(path, components=None, pruneTracks=False, skipGeometryAdding=False, mcTrackFinding=False, trackFitHypotheses=None, reco_tracks="RecoTracks", prune_temporary_tracks=True, fit_tracks=True, use_second_cdc_hits=False, skipHitPreparerAdding=False, use_svd_to_cdc_ckf=True, use_ecl_to_cdc_ckf=False, add_cdcTrack_QI=True, add_vxdTrack_QI=False, add_recoTrack_QI=False)
Definition: __init__.py:8
combined_quality_estimator_teacher.VXDQEDataCollectionTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:609
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.n_events_testing
n_events_testing
Number of events to generate for the test data set.
Definition: combined_quality_estimator_teacher.py:1821
combined_quality_estimator_teacher.RecoTrackQEHarvestingValidationTask.teacher_task
teacher_task
Teacher task to require to provide a quality estimator weightfile for add_tracking_with_quality_estim...
Definition: combined_quality_estimator_teacher.py:1479
combined_quality_estimator_teacher.CDCQEHarvestingValidationTask.teacher_task
teacher_task
Teacher task to require to provide a quality estimator weightfile for add_tracking_with_quality_estim...
Definition: combined_quality_estimator_teacher.py:1412
combined_quality_estimator_teacher.CDCQEHarvestingValidationTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1415
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.fast_bdt_option
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
Definition: combined_quality_estimator_teacher.py:1834
combined_quality_estimator_teacher.MasterTask.exclude_variables_cdc
list exclude_variables_cdc
list of variables to exclude for the cdc mva.
Definition: combined_quality_estimator_teacher.py:2295
combined_quality_estimator_teacher.VXDQEDataCollectionTask
Definition: combined_quality_estimator_teacher.py:557
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.training_target
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
Definition: combined_quality_estimator_teacher.py:1593
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.fast_bdt_option
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
Definition: combined_quality_estimator_teacher.py:2171
combined_quality_estimator_teacher.CDCQEDataCollectionTask.n_events
n_events
Number of events to generate.
Definition: combined_quality_estimator_teacher.py:694
combined_quality_estimator_teacher.RecoTrackQEEvaluationTask.data_collection_task
data_collection_task
Task that is required by the evaluation base class to collect the test data for the evaluation.
Definition: combined_quality_estimator_teacher.py:1776
combined_quality_estimator_teacher.VXDQEDataCollectionTask.create_path
def create_path(self)
Definition: combined_quality_estimator_teacher.py:633
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.teacher_task
TrackQETeacherBaseTask teacher_task(self)
Definition: combined_quality_estimator_teacher.py:1601
combined_quality_estimator_teacher.TrackQETeacherBaseTask.process
def process(self)
Definition: combined_quality_estimator_teacher.py:1123
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:2201
combined_quality_estimator_teacher.MasterTask.exclude_variables_rec
list exclude_variables_rec
list of variables to exclude for the recotrack mva:
Definition: combined_quality_estimator_teacher.py:2344
combined_quality_estimator_teacher.CDCQEValidationPlotsTask.harvesting_validation_task_instance
def harvesting_validation_task_instance(self)
Definition: combined_quality_estimator_teacher.py:2111
combined_quality_estimator_teacher.HarvestingValidationBaseTask.process_type
process_type
Define which kind of process shall be used.
Definition: combined_quality_estimator_teacher.py:1257
combined_quality_estimator_teacher.MasterTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:2408
combined_quality_estimator_teacher.CDCQEDataCollectionTask.random_seed
random_seed
Random basf2 seed used by the GenerateSimTask.
Definition: combined_quality_estimator_teacher.py:699
tracking.harvesting_validation.combined_module
Definition: combined_module.py:1
combined_quality_estimator_teacher.CDCQEHarvestingValidationTask.training_target
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
Definition: combined_quality_estimator_teacher.py:1406
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.harvesting_validation_task_instance
HarvestingValidationBaseTask harvesting_validation_task_instance(self)
Definition: combined_quality_estimator_teacher.py:1839
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:810
combined_quality_estimator_teacher.MasterTask.n_events_testing
n_events_testing
Number of events to generate for the test data set.
Definition: combined_quality_estimator_teacher.py:2284
tracking.harvesting_validation.combined_module.CombinedTrackingValidationModule
Definition: combined_module.py:9
combined_quality_estimator_teacher.VXDQEDataCollectionTask.get_input_files
def get_input_files(self, n_events=None, random_seed=None)
Definition: combined_quality_estimator_teacher.py:591
combined_quality_estimator_teacher.VXDQEValidationPlotsTask.harvesting_validation_task_instance
def harvesting_validation_task_instance(self)
Definition: combined_quality_estimator_teacher.py:2085
combined_quality_estimator_teacher.CDCQEDataCollectionTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:754
combined_quality_estimator_teacher.CDCQEDataCollectionTask.get_input_files
def get_input_files(self, n_events=None, random_seed=None)
Definition: combined_quality_estimator_teacher.py:719
combined_quality_estimator_teacher.VXDQEHarvestingValidationTask.add_tracking_with_quality_estimation
def add_tracking_with_quality_estimation(self, path)
Definition: combined_quality_estimator_teacher.py:1377
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:2163
combined_quality_estimator_teacher.SplitNMergeSimTask.bkgfiles_dir
bkgfiles_dir
Directory with overlay background root files.
Definition: combined_quality_estimator_teacher.py:488
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask
Definition: combined_quality_estimator_teacher.py:1568
combined_quality_estimator_teacher.RecoTrackQEHarvestingValidationTask
Definition: combined_quality_estimator_teacher.py:1467
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:896
combined_quality_estimator_teacher.HarvestingValidationBaseTask.add_tracking_with_quality_estimation
None add_tracking_with_quality_estimation(self, basf2.Path path)
Definition: combined_quality_estimator_teacher.py:1277
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.process_type
process_type
Define which kind of process shall be used.
Definition: combined_quality_estimator_teacher.py:2167
combined_quality_estimator_teacher.TrackQETeacherBaseTask.weightfile_identifier_basename
def weightfile_identifier_basename(self)
Definition: combined_quality_estimator_teacher.py:1042
combined_quality_estimator_teacher.HarvestingValidationBaseTask.teacher_task
TrackQETeacherBaseTask teacher_task(self)
Definition: combined_quality_estimator_teacher.py:1271
combined_quality_estimator_teacher.TrackQETeacherBaseTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:1028
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:1657
combined_quality_estimator_teacher.SplitNMergeSimTask
Definition: combined_quality_estimator_teacher.py:471
combined_quality_estimator_teacher.TrackQETeacherBaseTask.fast_bdt_option
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
Definition: combined_quality_estimator_teacher.py:1039
combined_quality_estimator_teacher.HarvestingValidationBaseTask.fast_bdt_option
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
Definition: combined_quality_estimator_teacher.py:1262
tracking.root_utils
Definition: root_utils.py:1
combined_quality_estimator_teacher.CDCQEHarvestingValidationTask
Definition: combined_quality_estimator_teacher.py:1400
combined_quality_estimator_teacher.RecoTrackQEHarvestingValidationTask.cdc_training_target
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
Definition: combined_quality_estimator_teacher.py:1473
combined_quality_estimator_teacher.CDCQETeacherTask
Definition: combined_quality_estimator_teacher.py:1179
combined_quality_estimator_teacher.MasterTask.exclude_variables_vxd
list exclude_variables_vxd
list of variables to exclude for the vxd mva:
Definition: combined_quality_estimator_teacher.py:2338
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.run
def run(self)
Definition: combined_quality_estimator_teacher.py:1667
combined_quality_estimator_teacher.HarvestingValidationBaseTask.create_path
def create_path(self)
Definition: combined_quality_estimator_teacher.py:1320
combined_quality_estimator_teacher.RecoTrackQEEvaluationTask.teacher_task
teacher_task
Task that is required by the evaluation base class to create the MVA weightfile that needs to be eval...
Definition: combined_quality_estimator_teacher.py:1773
combined_quality_estimator_teacher.RecoTrackQEHarvestingValidationTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1481
combined_quality_estimator_teacher.RecoTrackQETeacherTask.data_collection_task
data_collection_task
Defines DataCollectionTask to require by tha base class to collect features for the MVA training.
Definition: combined_quality_estimator_teacher.py:1214
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.n_events_training
n_events_training
Number of events to generate for the training data set.
Definition: combined_quality_estimator_teacher.py:2161
combined_quality_estimator_teacher.GenerateSimTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:382
combined_quality_estimator_teacher.HarvestingValidationBaseTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1285
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:2173
combined_quality_estimator_teacher.RecoTrackQEEvaluationTask.cdc_training_target
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
Definition: combined_quality_estimator_teacher.py:1781
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:1587
combined_quality_estimator_teacher.GenerateSimTask
Definition: combined_quality_estimator_teacher.py:354
combined_quality_estimator_teacher.HarvestingValidationBaseTask.n_events_testing
n_events_testing
Number of events to generate for the test data set.
Definition: combined_quality_estimator_teacher.py:1249
combined_quality_estimator_teacher.RecoTrackQEEvaluationTask
Definition: combined_quality_estimator_teacher.py:1766
combined_quality_estimator_teacher.RecoTrackQEValidationPlotsTask.harvesting_validation_task_instance
def harvesting_validation_task_instance(self)
Definition: combined_quality_estimator_teacher.py:2138
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:1825
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask
Definition: combined_quality_estimator_teacher.py:1815
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.n_events
n_events
Number of events to generate.
Definition: combined_quality_estimator_teacher.py:808
combined_quality_estimator_teacher.CDCTrackQEEvaluationTask
Definition: combined_quality_estimator_teacher.py:1751
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask
Definition: combined_quality_estimator_teacher.py:794
combined_quality_estimator_teacher.CheckExistingFile.filename
filename
filename to check
Definition: combined_quality_estimator_teacher.py:550
combined_quality_estimator_teacher.TrackQETeacherBaseTask.get_weightfile_xml_identifier
def get_weightfile_xml_identifier(self, fast_bdt_option=None, recotrack_option=None)
Definition: combined_quality_estimator_teacher.py:1051
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.n_events_training
n_events_training
Number of events to generate for the training data set.
Definition: combined_quality_estimator_teacher.py:1585
combined_quality_estimator_teacher.RecoTrackQEEvaluationTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1783
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.cdc_training_target
cdc_training_target
Feature/vaiable to use as truth label for the CDC track quality estimator.
Definition: combined_quality_estimator_teacher.py:2169
combined_quality_estimator_teacher.RecoTrackQEEvaluationTask.task_acronym
string task_acronym
Acronym that is required by the evaluation base class to find the correct collection task file.
Definition: combined_quality_estimator_teacher.py:1779
combined_quality_estimator_teacher.GenerateSimTask.random_seed
random_seed
Random basf2 seed.
Definition: combined_quality_estimator_teacher.py:369
combined_quality_estimator_teacher.HarvestingValidationBaseTask.exclude_variables
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
Definition: combined_quality_estimator_teacher.py:1260
combined_quality_estimator_teacher.CDCQEDataCollectionTask
Definition: combined_quality_estimator_teacher.py:686
combined_quality_estimator_teacher.TrackQETeacherBaseTask.process_type
process_type
Define which kind of process shall be used.
Definition: combined_quality_estimator_teacher.py:1032
combined_quality_estimator_teacher.HarvestingValidationBaseTask.validation_output_file_name
string validation_output_file_name
Name of the "harvested" ROOT output file with variables that can be used for validation.
Definition: combined_quality_estimator_teacher.py:1264
combined_quality_estimator_teacher.VXDQEHarvestingValidationTask
Definition: combined_quality_estimator_teacher.py:1364
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.create_path
def create_path(self)
Definition: combined_quality_estimator_teacher.py:903
combined_quality_estimator_teacher.VXDQEDataCollectionTask.random_seed
random_seed
Random basf2 seed used by the GenerateSimTask.
Definition: combined_quality_estimator_teacher.py:571
combined_quality_estimator_teacher.HarvestingValidationBaseTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:1253
combined_quality_estimator_teacher.GenerateSimTask.output_file_name
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
Definition: combined_quality_estimator_teacher.py:375
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask
Definition: combined_quality_estimator_teacher.py:2155
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.process_type
process_type
Define which kind of process shall be used.
Definition: combined_quality_estimator_teacher.py:1829
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.process
def process(self)
Definition: combined_quality_estimator_teacher.py:2207
combined_quality_estimator_teacher.RecoTrackQETeacherTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1218
combined_quality_estimator_teacher.VXDQEDataCollectionTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:568
combined_quality_estimator_teacher.MasterTask.num_processes
num_processes
Number of basf2 processes to use in Basf2PathTasks.
Definition: combined_quality_estimator_teacher.py:2288
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.random_seed
random_seed
Random basf2 seed used by the GenerateSimTask.
Definition: combined_quality_estimator_teacher.py:813
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask.on_failure
def on_failure(self, exception)
Definition: combined_quality_estimator_teacher.py:2260
combined_quality_estimator_teacher.QEWeightsLocalDBCreatorTask._clean
def _clean(self)
Definition: combined_quality_estimator_teacher.py:2244
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.process_type
process_type
Define which kind of process shall be used.
Definition: combined_quality_estimator_teacher.py:1591
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.task_acronym
def task_acronym(self)
Definition: combined_quality_estimator_teacher.py:1620
combined_quality_estimator_teacher.VXDQEHarvestingValidationTask.teacher_task
teacher_task
Teacher task to require to provide a quality estimator weightfile for add_tracking_with_quality_estim...
Definition: combined_quality_estimator_teacher.py:1375
combined_quality_estimator_teacher.RecoTrackQETeacherTask.recotrack_option
recotrack_option
RecoTrack option, use string that is additive: deleteCDCQI0XY (= deletes CDCTracks with CDC-QI below ...
Definition: combined_quality_estimator_teacher.py:1203
combined_quality_estimator_teacher.MasterTask
Definition: combined_quality_estimator_teacher.py:2269
combined_quality_estimator_teacher.SplitNMergeSimTask.random_seed
random_seed
Random basf2 seed.
Definition: combined_quality_estimator_teacher.py:486
combined_quality_estimator_teacher.RecoTrackQETeacherTask
Definition: combined_quality_estimator_teacher.py:1195
combined_quality_estimator_teacher.CDCQEDataCollectionTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:737
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.exclude_variables
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
Definition: combined_quality_estimator_teacher.py:1832
combined_quality_estimator_teacher.SplitNMergeSimTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:499
combined_quality_estimator_teacher.VXDQEDataCollectionTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:626
combined_quality_estimator_teacher.MasterTask.process_type
process_type
Define which kind of process shall be used.
Definition: combined_quality_estimator_teacher.py:2280
background.get_background_files
def get_background_files(folder=None, output_file_info=True)
Definition: background.py:10
combined_quality_estimator_teacher.CDCQEHarvestingValidationTask.add_tracking_with_quality_estimation
def add_tracking_with_quality_estimation(self, path)
Definition: combined_quality_estimator_teacher.py:1443
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.recotrack_option
recotrack_option
RecoTrack option, use string that is additive: deleteCDCQI0XY (= deletes CDCTracks with CDC-QI below ...
Definition: combined_quality_estimator_teacher.py:819
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.exclude_variables
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
Definition: combined_quality_estimator_teacher.py:1596
combined_quality_estimator_teacher.GenerateSimTask.bkgfiles_dir
bkgfiles_dir
Directory with overlay background root files.
Definition: combined_quality_estimator_teacher.py:371
combined_quality_estimator_teacher.GenerateSimTask.experiment_number
experiment_number
Experiment number of the conditions database, e.g.
Definition: combined_quality_estimator_teacher.py:366
combined_quality_estimator_teacher.CDCQEValidationPlotsTask.training_target
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
Definition: combined_quality_estimator_teacher.py:2108
combined_quality_estimator_teacher.VXDQEValidationPlotsTask
Definition: combined_quality_estimator_teacher.py:2077
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.primaries_only
primaries_only
Whether to normalize the track finding efficiencies to primary particles only.
Definition: combined_quality_estimator_teacher.py:1836
combined_quality_estimator_teacher.HarvestingValidationBaseTask.reco_output_file_name
string reco_output_file_name
Name of the output of the RootOutput module with reconstructed events.
Definition: combined_quality_estimator_teacher.py:1266
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.n_events_testing
n_events_testing
Number of events to generate for the test data set.
Definition: combined_quality_estimator_teacher.py:1583
combined_quality_estimator_teacher.HarvestingValidationBaseTask.output
def output(self)
Definition: combined_quality_estimator_teacher.py:1312
combined_quality_estimator_teacher.TrackQETeacherBaseTask.data_collection_task
Basf2PathTask data_collection_task(self)
Definition: combined_quality_estimator_teacher.py:1087
combined_quality_estimator_teacher.CheckExistingFile
Definition: combined_quality_estimator_teacher.py:545
combined_quality_estimator_teacher.CDCQEDataCollectionTask.create_path
def create_path(self)
Definition: combined_quality_estimator_teacher.py:761
combined_quality_estimator_teacher.TrackQETeacherBaseTask.requires
def requires(self)
Definition: combined_quality_estimator_teacher.py:1096
combined_quality_estimator_teacher.RecoTrackQEHarvestingValidationTask.add_tracking_with_quality_estimation
def add_tracking_with_quality_estimation(self, path)
Definition: combined_quality_estimator_teacher.py:1525
combined_quality_estimator_teacher.GenerateSimTask.create_path
def create_path(self)
Definition: combined_quality_estimator_teacher.py:389
combined_quality_estimator_teacher.VXDQEDataCollectionTask.get_records_file_name
def get_records_file_name(self, n_events=None, random_seed=None)
Filename of the recorded/collected data for the final QE MVA training.
Definition: combined_quality_estimator_teacher.py:575
combined_quality_estimator_teacher.RecoTrackQEDataCollectionTask.cdc_training_target
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
Definition: combined_quality_estimator_teacher.py:815
combined_quality_estimator_teacher.PlotsFromHarvestingValidationBaseTask.output_pdf_file_basename
def output_pdf_file_basename(self)
Definition: combined_quality_estimator_teacher.py:1847
combined_quality_estimator_teacher.TrackQEEvaluationBaseTask.fast_bdt_option
fast_bdt_option
Hyperparameter options for the FastBDT algorithm.
Definition: combined_quality_estimator_teacher.py:1598