13 combined_module_quality_estimator_teacher
14 -----------------------------------------
16 Information on the MVA Track Quality Indicator / Estimator can be found
18 <https://confluence.desy.de/display/BI/MVA+Track+Quality+Indicator>`_.
20 Purpose of this script
21 ~~~~~~~~~~~~~~~~~~~~~~
23 This python script is used for the combined training and validation of three
24 classifiers, the actual final MVA track quality estimator and the two quality
25 estimators for the intermediate standalone track finders that it depends on.
27 - Final MVA track quality estimator:
28 The final quality estimator for fully merged and fitted tracks (RecoTracks).
29 Its classifier uses features from the track fitting, merger, hit pattern, ...
30 But it also uses the outputs from respective intermediate quality
31 estimators for the VXD and the CDC track finding as inputs. It provides
32 the final quality indicator (QI) exported to the track objects.
34 - VXDTF2 track quality estimator:
35 MVA quality estimator for the VXD standalone track finding.
37 - CDC track quality estimator:
38 MVA quality estimator for the CDC standalone track finding.
40 Each classifier requires for its training a different training data set and they
41 need to be validated on a separate testing data set. Further, the final quality
42 estimator can only be trained, when the trained weights for the intermediate
43 quality estimators are available. If the final estimator shall be trained without
44 one or both previous estimators, the requirements have to be commented out in the
45 __init__.py file of tracking.
46 For all estimators, a list of variables to be ignored is specified in the MasterTask.
47 The current choice is mainly based on pure data MC agreement in these quantities or
48 on outdated implementations. It was decided to leave them in the hardcoded "ugly" way
49 in here to remind future generations that they exist in principle and they should and
50 could be added to the estimator, once their modelling becomes better in future or an
51 alternative implementation is programmed.
52 To avoid mistakes, b2luigi is used to create a task chain for a combined training and
53 validation of all classifiers.
55 b2luigi: Understanding the steering file
56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58 All trainings and validations are done in the correct order in this steering
59 file. For the purpose of creating a dependency graph, the `b2luigi
60 <https://b2luigi.readthedocs.io>`_ python package is used, which extends the
61 `luigi <https://luigi.readthedocs.io>`_ packag developed by spotify.
63 Each task that has to be done is represented by a special class, which defines
64 which defines parameters, output files and which other tasks with which
65 parameters it depends on. For example a teacher task, which runs
66 ``basf2_mva_teacher.py`` to train the classifier, depends on a data collection
67 task which runs a reconstruction and writes out track-wise variables into a root
68 file for training. An evaluation/validation task for testing the classifier
69 requires both the teacher task, as it needs the weightfile to be present, and
70 also a data collection task, because it needs a dataset for testing classifier.
72 The final task that defines which tasks need to be done for the steering file to
73 finish is the ``MasterTask``. When you only want to run parts of the
74 training/validation pipeline, you can comment out requirements in the Master
75 task or replace them by lower-level tasks during debugging.
80 This steering file relies on b2luigi_ for task scheduling and `uncertain_panda
81 <https://github.com/nils-braun/uncertain_panda>`_ for uncertainty calculations.
82 uncertain_panda is not in the externals and b2luigi is not upto v01-07-01. Both
83 can be installed via pip::
85 python3 -m pip install [--user] b2luigi uncertain_panda
87 Use the ``--user`` option if you have not rights to install python packages into
88 your externals (e.g. because you are using cvmfs) and install them in
89 ``$HOME/.local`` instead.
94 Instead of command line arguments, the b2luigi script is configured via a
95 ``settings.json`` file. Open it in your favorite text editor and modify it to
96 fit to your requirements.
101 You can test the b2luigi without running it via::
103 python3 combined_quality_estimator_teacher.py --dry-run
104 python3 combined_quality_estimator_teacher.py --show-output
106 This will show the outputs and show potential errors in the definitions of the
107 luigi task dependencies. To run the the steering file in normal (local) mode,
110 python3 combined_quality_estimator_teacher.py
112 I usually use the interactive luigi web interface via the central scheduler
113 which visualizes the task graph while it is running. Therefore, the scheduler
114 daemon ``luigid`` has to run in the background, which is located in
115 ``~/.local/bin/luigid`` in case b2luigi had been installed with ``--user``. For
120 Then, execute your steering (e.g. in another terminal) with::
122 python3 combined_quality_estimator_teacher.py --scheduler-port 8886
124 To view the web interface, open your webbrowser enter into the url bar::
128 If you don't run the steering file on the same machine on which you run your web
129 browser, you have two options:
131 1. Run both the steering file and ``luigid`` remotely and use
132 ssh-port-forwarding to your local host. Therefore, run on your local
135 ssh -N -f -L 8886:localhost:8886 <remote_user>@<remote_host>
137 2. Run the ``luigid`` scheduler locally and use the ``--scheduler-host <your
138 local host>`` argument when calling the steering file
140 Accessing the results / output files
141 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143 All output files are stored in a directory structure in the ``result_path``. The
144 directory tree encodes the used b2luigi parameters. This ensures reproducibility
145 and makes parameter searches easy. Sometimes, it is hard to find the relevant
146 output files. You can view the whole directory structure by running ``tree
147 <result_path>``. Ise the unix ``find`` command to find the files that interest
150 find <result_path> -name "*.pdf" # find all validation plot files
151 find <result_path> -name "*.root" # find all ROOT files
156 from pathlib
import Path
160 from datetime
import datetime
161 from typing
import Iterable
163 import matplotlib.pyplot
as plt
166 from matplotlib.backends.backend_pdf
import PdfPages
170 from packaging
import version
178 install_helpstring_formatter = (
"\nCould not find {module} python module.Try installing it via\n"
179 " python3 -m pip install [--user] {module}\n")
182 from b2luigi.core.utils
import get_serialized_parameters, get_log_file_dir, create_output_dirs
183 from b2luigi.basf2_helper
import Basf2PathTask, Basf2Task
184 from b2luigi.core.task
import Task, ExternalTask
185 from b2luigi.basf2_helper.utils
import get_basf2_git_hash
186 except ModuleNotFoundError:
187 print(install_helpstring_formatter.format(module=
"b2luigi"))
190 from uncertain_panda
import pandas
as upd
191 except ModuleNotFoundError:
192 print(install_helpstring_formatter.format(module=
"uncertain_panda"))
200 version.parse(b2luigi.__version__) <= version.parse(
"0.3.2")
and
201 get_basf2_git_hash()
is None and
202 os.getenv(
"BELLE2_LOCAL_DIR")
is not None
204 print(f
"b2luigi version could not obtain git hash because of a bug not yet fixed in version {b2luigi.__version__}\n"
205 "Please install the latest version of b2luigi from github via\n\n"
206 " python3 -m pip install --upgrade [--user] git+https://github.com/nils-braun/b2luigi.git\n")
212 def create_fbdt_option_string(fast_bdt_option):
214 returns a readable string created by the fast_bdt_option array
216 return "_nTrees" + str(fast_bdt_option[0]) +
"_nCuts" + str(fast_bdt_option[1]) +
"_nLevels" + \
217 str(fast_bdt_option[2]) +
"_shrin" + str(int(round(100*fast_bdt_option[3], 0)))
220 def createV0momenta(x, mu, beta):
222 Copied from Biancas K_S0 particle gun code: Returns a realistic V0 momentum distribution
223 when running over x. Mu and Beta are properties of the function that define center and tails.
224 Used for the particle gun simulation code for K_S0 and Lambda_0
226 return (1/beta)*np.exp(-(x - mu)/beta) * np.exp(-np.exp(-(x - mu) / beta))
229 def my_basf2_mva_teacher(
232 weightfile_identifier,
233 target_variable="truth",
234 exclude_variables=None,
235 fast_bdt_option=[200, 8, 3, 0.1]
238 My custom wrapper for basf2 mva teacher. Adapted from code in ``trackfindingcdc_teacher``.
240 :param records_files: List of files with collected ("recorded") variables to use as training data for the MVA.
241 :param tree_name: Name of the TTree in the ROOT file from the ``data_collection_task``
242 that contains the training data for the MVA teacher.
243 :param weightfile_identifier: Name of the weightfile that is created.
244 Should either end in ".xml" for local weightfiles or in ".root", when
245 the weightfile needs later to be uploaded as a payload to the conditions
247 :param target_variable: Feature/variable to use as truth label in the quality estimator MVA classifier.
248 :param exclude_variables: List of collected variables to not use in the training of the QE MVA classifier.
249 In addition to variables containing the "truth" substring, which are excluded by default.
250 :param fast_bdt_option: specified fast BDT options, default: [200, 8, 3, 0.1] [nTrees, nCuts, nLevels, shrinkage]
252 if exclude_variables
is None:
253 exclude_variables = []
255 weightfile_extension = Path(weightfile_identifier).suffix
256 if weightfile_extension
not in {
".xml",
".root"}:
257 raise ValueError(f
"Weightfile Identifier should end in .xml or .root, but ends in {weightfile_extension}")
260 with root_utils.root_open(records_files[0])
as records_tfile:
261 input_tree = records_tfile.Get(tree_name)
262 feature_names = [leave.GetName()
for leave
in input_tree.GetListOfLeaves()]
265 truth_free_variable_names = [
267 for name
in feature_names
269 (
"truth" not in name)
and
270 (name != target_variable)
and
271 (name
not in exclude_variables)
274 if "weight" in truth_free_variable_names:
275 truth_free_variable_names.remove(
"weight")
276 weight_variable =
"weight"
277 elif "__weight__" in truth_free_variable_names:
278 truth_free_variable_names.remove(
"__weight__")
279 weight_variable =
"__weight__"
284 general_options = basf2_mva.GeneralOptions()
285 general_options.m_datafiles = basf2_mva.vector(*records_files)
286 general_options.m_treename = tree_name
287 general_options.m_weight_variable = weight_variable
288 general_options.m_identifier = weightfile_identifier
289 general_options.m_variables = basf2_mva.vector(*truth_free_variable_names)
290 general_options.m_target_variable = target_variable
291 fastbdt_options = basf2_mva.FastBDTOptions()
293 fastbdt_options.m_nTrees = fast_bdt_option[0]
294 fastbdt_options.m_nCuts = fast_bdt_option[1]
295 fastbdt_options.m_nLevels = fast_bdt_option[2]
296 fastbdt_options.m_shrinkage = fast_bdt_option[3]
298 basf2_mva.teacher(general_options, fastbdt_options)
301 def _my_uncertain_mean(series: upd.Series):
303 Temporary Workaround bug in ``uncertain_panda`` where a ``ValueError`` is
304 thrown for ``Series.unc.mean`` if the series is empty. Can be replaced by
305 .unc.mean when the issue is fixed.
306 https://github.com/nils-braun/uncertain_panda/issues/2
309 return series.unc.mean()
317 def get_uncertain_means_for_qi_cuts(df: upd.DataFrame, column: str, qi_cuts: Iterable[float]):
319 Return a pandas series with an mean of the dataframe column and
320 uncertainty for each quality indicator cut.
322 :param df: Pandas dataframe with at least ``quality_indicator``
323 and another numeric ``column``.
324 :param column: Column of which we want to aggregate the means
325 and uncertainties for different QI cuts
326 :param qi_cuts: Iterable of quality indicator minimal thresholds.
327 :returns: Series of of means and uncertainties with ``qi_cuts`` as index
330 uncertain_means = (_my_uncertain_mean(df.query(f
"quality_indicator > {qi_cut}")[column])
331 for qi_cut
in qi_cuts)
332 uncertain_means_series = upd.Series(data=uncertain_means, index=qi_cuts)
333 return uncertain_means_series
336 def plot_with_errobands(uncertain_series,
337 error_band_alpha=0.3,
339 fill_between_kwargs={},
342 Plot an uncertain series with error bands for y-errors
346 uncertain_series = uncertain_series.dropna()
347 ax.plot(uncertain_series.index.values, uncertain_series.nominal_value, **plot_kwargs)
348 ax.fill_between(x=uncertain_series.index,
349 y1=uncertain_series.nominal_value - uncertain_series.std_dev,
350 y2=uncertain_series.nominal_value + uncertain_series.std_dev,
351 alpha=error_band_alpha,
352 **fill_between_kwargs)
355 def format_dictionary(adict, width=80, bullet="•"):
357 Helper function to format dictionary to string as a wrapped key-value bullet
358 list. Useful to print metadata from dictionaries.
360 :param adict: Dictionary to format
361 :param width: Characters after which to wrap a key-value line
362 :param bullet: Character to begin a key-value line with, e.g. ``-`` for a
368 return "\n".join(textwrap.fill(f
"{bullet} {key}: {value}", width=width)
369 for (key, value)
in adict.items())
376 Generate simulated Monte Carlo with background overlay.
378 Make sure to use different ``random_seed`` parameters for the training data
379 format the classifier trainings and for the test data for the respective
380 evaluation/validation tasks.
384 n_events = b2luigi.IntParameter()
386 experiment_number = b2luigi.IntParameter()
389 random_seed = b2luigi.Parameter()
391 bkgfiles_dir = b2luigi.Parameter(
402 Create output file name depending on number of events and production
403 mode that is specified in the random_seed string.
407 if random_seed
is None:
409 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
413 Generate list of output files that the task should produce.
414 The task is considered finished if and only if the outputs all exist.
420 Create basf2 path to process with event generation and simulation.
423 path = basf2.create_path()
429 f
"Simulating events with experiment number {self.experiment_number} is not implemented yet.")
431 "EventInfoSetter", evtNumList=[self.
n_eventsn_events], runList=[runNo], expList=[self.
experiment_numberexperiment_number]
434 path.add_module(
"EvtGenInput")
436 path.add_module(
"EvtGenInput")
437 path.add_module(
"InclusiveParticleChecker", particles=[310, 3122], includeConjugates=
True)
439 import generators
as ge
460 pdgs = [310, 3122, -3122]
462 myx = [i*0.01
for i
in range(321)]
465 y = createV0momenta(x, mu, beta)
467 polParams = myx + myy
471 particlegun = basf2.register_module(
'ParticleGun')
472 particlegun.param(
'pdgCodes', pdg_list)
473 particlegun.param(
'nTracks', 8)
474 particlegun.param(
'momentumGeneration',
'polyline')
475 particlegun.param(
'momentumParams', polParams)
476 particlegun.param(
'thetaGeneration',
'uniformCos')
477 particlegun.param(
'thetaParams', [17, 150])
478 particlegun.param(
'phiGeneration',
'uniform')
479 particlegun.param(
'phiParams', [0, 360])
480 particlegun.param(
'vertexGeneration',
'fixed')
481 particlegun.param(
'xVertexParams', [0])
482 particlegun.param(
'yVertexParams', [0])
483 particlegun.param(
'zVertexParams', [0])
484 path.add_module(particlegun)
486 ge.add_babayaganlo_generator(path=path, finalstate=
'ee', minenergy=0.15, minangle=10.0)
488 ge.add_kkmc_generator(path=path, finalstate=
'mu+mu-')
490 babayaganlo = basf2.register_module(
'BabayagaNLOInput')
491 babayaganlo.param(
'FinalState',
'gg')
492 babayaganlo.param(
'MaxAcollinearity', 180.0)
493 babayaganlo.param(
'ScatteringAngleRange', [0., 180.])
494 babayaganlo.param(
'FMax', 75000)
495 babayaganlo.param(
'MinEnergy', 0.01)
496 babayaganlo.param(
'Order',
'exp')
497 babayaganlo.param(
'DebugEnergySpread', 0.01)
498 babayaganlo.param(
'Epsilon', 0.00005)
499 path.add_module(babayaganlo)
500 generatorpreselection = basf2.register_module(
'GeneratorPreselection')
501 generatorpreselection.param(
'nChargedMin', 0)
502 generatorpreselection.param(
'nChargedMax', 999)
503 generatorpreselection.param(
'MinChargedPt', 0.15)
504 generatorpreselection.param(
'MinChargedTheta', 17.)
505 generatorpreselection.param(
'MaxChargedTheta', 150.)
506 generatorpreselection.param(
'nPhotonMin', 1)
507 generatorpreselection.param(
'MinPhotonEnergy', 1.5)
508 generatorpreselection.param(
'MinPhotonTheta', 15.0)
509 generatorpreselection.param(
'MaxPhotonTheta', 165.0)
510 generatorpreselection.param(
'applyInCMS',
True)
511 path.add_module(generatorpreselection)
512 empty = basf2.create_path()
513 generatorpreselection.if_value(
'!=11', empty)
515 ge.add_aafh_generator(path=path, finalstate=
'e+e-e+e-', preselection=
False)
517 ge.add_aafh_generator(path=path, finalstate=
'e+e-mu+mu-', preselection=
False)
519 ge.add_kkmc_generator(path, finalstate=
'tau+tau-')
521 ge.add_continuum_generator(path, finalstate=
'ddbar')
523 ge.add_continuum_generator(path, finalstate=
'uubar')
525 ge.add_continuum_generator(path, finalstate=
'ssbar')
527 ge.add_continuum_generator(path, finalstate=
'ccbar')
535 components = [
'PXD',
'SVD',
'CDC',
'ECL',
'TOP',
'ARICH',
'TRG']
542 outputFileName=self.get_output_file_name(self.
output_file_nameoutput_file_name()),
551 Generate simulated Monte Carlo with background overlay.
553 Make sure to use different ``random_seed`` parameters for the training data
554 format the classifier trainings and for the test data for the respective
555 evaluation/validation tasks.
559 n_events = b2luigi.IntParameter()
561 experiment_number = b2luigi.IntParameter()
564 random_seed = b2luigi.Parameter()
566 bkgfiles_dir = b2luigi.Parameter(
577 Create output file name depending on number of events and production
578 mode that is specified in the random_seed string.
582 if random_seed
is None:
584 return "generated_mc_N" + str(n_events) +
"_" + random_seed +
".root"
588 Generate list of output files that the task should produce.
589 The task is considered finished if and only if the outputs all exist.
595 Generate list of luigi Tasks that this Task depends on.
597 n_events_per_task = MasterTask.n_events_per_task
598 quotient, remainder = divmod(self.
n_eventsn_events, n_events_per_task)
599 for i
in range(quotient):
602 num_processes=MasterTask.num_processes,
603 random_seed=self.
random_seedrandom_seed +
'_' + str(i).zfill(3),
604 n_events=n_events_per_task,
610 num_processes=MasterTask.num_processes,
611 random_seed=self.
random_seedrandom_seed +
'_' + str(quotient).zfill(3),
616 @b2luigi.on_temporary_files
619 When all GenerateSimTasks finished, merge the output.
621 create_output_dirs(self)
624 for _, file_name
in self.get_input_file_names().items():
625 file_list.append(*file_name)
626 print(
"Merge the following files:")
628 cmd = [
"b2file-merge",
"-f"]
629 args = cmd + [self.get_output_file_name(self.
output_file_nameoutput_file_name())] + file_list
630 subprocess.check_call(args)
631 print(
"Finished merging. Now remove the input files to save space.")
633 for tempfile
in file_list:
634 args = cmd2 + [tempfile]
635 subprocess.check_call(args)
640 Task to check if the given file really exists.
643 filename = b2luigi.Parameter()
647 Specify the output to be the file that was just checked.
649 from luigi
import LocalTarget
650 return LocalTarget(self.
filenamefilename)
655 Collect variables/features from VXDTF2 tracking and write them to a ROOT
658 These variables are to be used as labelled training data for the MVA
659 classifier which is the VXD track quality estimator
662 n_events = b2luigi.IntParameter()
664 experiment_number = b2luigi.IntParameter()
667 random_seed = b2luigi.Parameter()
674 Create output file name depending on number of events and production
675 mode that is specified in the random_seed string.
679 if random_seed
is None:
681 if 'vxd' not in random_seed:
682 random_seed +=
'_vxd'
683 if 'DATA' in random_seed:
684 return 'qe_records_DATA_vxd.root'
686 if 'USESIMBB' in random_seed:
687 random_seed =
'BBBAR_' + random_seed.split(
"_", 1)[1]
688 elif 'USESIMEE' in random_seed:
689 random_seed =
'BHABHA_' + random_seed.split(
"_", 1)[1]
690 return 'qe_records_N' + str(n_events) +
'_' + random_seed +
'.root'
694 Get input file names depending on the use case: If they already exist, search in
695 the corresponding folders, for data check the specified list and if they are created
696 in the same run, check for the task that produced them.
700 if random_seed
is None:
702 if "USESIM" in random_seed:
703 if 'USESIMBB' in random_seed:
704 random_seed =
'BBBAR_' + random_seed.split(
"_", 1)[1]
705 elif 'USESIMEE' in random_seed:
706 random_seed =
'BHABHA_' + random_seed.split(
"_", 1)[1]
707 return [
'datafiles/' + GenerateSimTask.output_file_name(GenerateSimTask,
708 n_events=n_events, random_seed=random_seed)]
709 elif "DATA" in random_seed:
710 return MasterTask.datafiles
712 return self.get_input_file_names(GenerateSimTask.output_file_name(
713 GenerateSimTask, n_events=n_events, random_seed=random_seed))
717 Generate list of luigi Tasks that this Task depends on.
726 bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.
experiment_numberexperiment_number],
734 Generate list of output files that the task should produce.
735 The task is considered finished if and only if the outputs all exist.
741 Create basf2 path with VXDTF2 tracking and VXD QE data collection.
743 path = basf2.create_path()
747 inputFileNames=inputFileNames,
749 path.add_module(
"Gearbox")
750 tracking.add_geometry_modules(path)
752 from rawdata
import add_unpackers
753 add_unpackers(path, components=[
'SVD',
'PXD'])
754 tracking.add_hit_preparation_modules(path)
755 tracking.add_vxd_track_finding_vxdtf2(
756 path, components=[
"SVD"], add_mva_quality_indicator=
False
760 "VXDQETrainingDataCollector",
762 SpacePointTrackCandsStoreArrayName=
"SPTrackCands",
763 EstimationMethod=
"tripletFit",
765 ClusterInformation=
"Average",
766 MCStrictQualityEstimator=
False,
772 "TrackFinderMCTruthRecoTracks",
773 RecoTracksStoreArrayName=
"MCRecoTracks",
780 "VXDQETrainingDataCollector",
782 SpacePointTrackCandsStoreArrayName=
"SPTrackCands",
783 EstimationMethod=
"tripletFit",
785 ClusterInformation=
"Average",
786 MCStrictQualityEstimator=
True,
794 Collect variables/features from CDC tracking and write them to a ROOT file.
796 These variables are to be used as labelled training data for the MVA
797 classifier which is the CDC track quality estimator
800 n_events = b2luigi.IntParameter()
802 experiment_number = b2luigi.IntParameter()
805 random_seed = b2luigi.Parameter()
812 Create output file name depending on number of events and production
813 mode that is specified in the random_seed string.
817 if random_seed
is None:
819 if 'cdc' not in random_seed:
820 random_seed +=
'_cdc'
821 if 'DATA' in random_seed:
822 return 'qe_records_DATA_cdc.root'
824 if 'USESIMBB' in random_seed:
825 random_seed =
'BBBAR_' + random_seed.split(
"_", 1)[1]
826 elif 'USESIMEE' in random_seed:
827 random_seed =
'BHABHA_' + random_seed.split(
"_", 1)[1]
828 return 'qe_records_N' + str(n_events) +
'_' + random_seed +
'.root'
832 Get input file names depending on the use case: If they already exist, search in
833 the corresponding folders, for data check the specified list and if they are created
834 in the same run, check for the task that produced them.
838 if random_seed
is None:
840 if "USESIM" in random_seed:
841 if 'USESIMBB' in random_seed:
842 random_seed =
'BBBAR_' + random_seed.split(
"_", 1)[1]
843 elif 'USESIMEE' in random_seed:
844 random_seed =
'BHABHA_' + random_seed.split(
"_", 1)[1]
845 return [
'datafiles/' + GenerateSimTask.output_file_name(GenerateSimTask,
846 n_events=n_events, random_seed=random_seed)]
847 elif "DATA" in random_seed:
848 return MasterTask.datafiles
850 return self.get_input_file_names(GenerateSimTask.output_file_name(
851 GenerateSimTask, n_events=n_events, random_seed=random_seed))
855 Generate list of luigi Tasks that this Task depends on.
864 bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.
experiment_numberexperiment_number],
872 Generate list of output files that the task should produce.
873 The task is considered finished if and only if the outputs all exist.
879 Create basf2 path with CDC standalone tracking and CDC QE with recording filter for MVA feature collection.
881 path = basf2.create_path()
885 inputFileNames=inputFileNames,
887 path.add_module(
"Gearbox")
888 tracking.add_geometry_modules(path)
890 filter_choice =
"recording_data"
891 from rawdata
import add_unpackers
892 add_unpackers(path, components=[
'CDC'])
894 filter_choice =
"recording"
897 tracking.add_cdc_track_finding(path, with_ca=
False, add_mva_quality_indicator=
True)
899 basf2.set_module_parameters(
901 name=
"TFCDC_TrackQualityEstimator",
902 filter=filter_choice,
912 Collect variables/features from the reco track reconstruction including the
913 fit and write them to a ROOT file.
915 These variables are to be used as labelled training data for the MVA
916 classifier which is the MVA track quality estimator. The collected
917 variables include the classifier outputs from the VXD and CDC quality
918 estimators, namely the CDC and VXD quality indicators, combined with fit,
919 merger, timing, energy loss information etc. This task requires the
920 subdetector quality estimators to be trained.
924 n_events = b2luigi.IntParameter()
926 experiment_number = b2luigi.IntParameter()
929 random_seed = b2luigi.Parameter()
931 cdc_training_target = b2luigi.Parameter()
935 recotrack_option = b2luigi.Parameter(
937 default=
'deleteCDCQI080'
941 fast_bdt_option = b2luigi.ListParameter(
943 hashed=
True, default=[200, 8, 3, 0.1]
952 Create output file name depending on number of events and production
953 mode that is specified in the random_seed string.
957 if random_seed
is None:
959 if recotrack_option
is None:
961 if 'rec' not in random_seed:
962 random_seed +=
'_rec'
963 if 'DATA' in random_seed:
964 return 'qe_records_DATA_rec.root'
966 if 'USESIMBB' in random_seed:
967 random_seed =
'BBBAR_' + random_seed.split(
"_", 1)[1]
968 elif 'USESIMEE' in random_seed:
969 random_seed =
'BHABHA_' + random_seed.split(
"_", 1)[1]
970 return 'qe_records_N' + str(n_events) +
'_' + random_seed +
'_' + recotrack_option +
'.root'
974 Get input file names depending on the use case: If they already exist, search in
975 the corresponding folders, for data check the specified list and if they are created
976 in the same run, check for the task that produced them.
980 if random_seed
is None:
982 if "USESIM" in random_seed:
983 if 'USESIMBB' in random_seed:
984 random_seed =
'BBBAR_' + random_seed.split(
"_", 1)[1]
985 elif 'USESIMEE' in random_seed:
986 random_seed =
'BHABHA_' + random_seed.split(
"_", 1)[1]
987 return [
'datafiles/' + GenerateSimTask.output_file_name(GenerateSimTask,
988 n_events=n_events, random_seed=random_seed)]
989 elif "DATA" in random_seed:
990 return MasterTask.datafiles
992 return self.get_input_file_names(GenerateSimTask.output_file_name(
993 GenerateSimTask, n_events=n_events, random_seed=random_seed))
997 Generate list of luigi Tasks that this Task depends on.
1006 bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.
experiment_numberexperiment_number],
1014 n_events_training=MasterTask.n_events_training,
1017 process_type=self.
random_seedrandom_seed.split(
"_", 1)[0],
1018 exclude_variables=MasterTask.exclude_variables_cdc,
1023 n_events_training=MasterTask.n_events_training,
1025 process_type=self.
random_seedrandom_seed.split(
"_", 1)[0],
1026 exclude_variables=MasterTask.exclude_variables_vxd,
1032 Generate list of output files that the task should produce.
1033 The task is considered finished if and only if the outputs all exist.
1039 Create basf2 reconstruction path that should mirror the default path
1040 from ``add_tracking_reconstruction()``, but with modules for the VXD QE
1041 and CDC QE application and for collection of variables for the reco
1042 track quality estimator.
1044 path = basf2.create_path()
1048 inputFileNames=inputFileNames,
1050 path.add_module(
"Gearbox")
1060 from rawdata
import add_unpackers
1062 tracking.add_tracking_reconstruction(path, add_cdcTrack_QI=mvaCDC, add_vxdTrack_QI=mvaVXD, add_recoTrack_QI=
True)
1068 cdc_identifier =
'datafiles/' + \
1069 CDCQETeacherTask.get_weightfile_xml_identifier(CDCQETeacherTask, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option)
1070 if os.path.exists(cdc_identifier):
1071 replace_cdc_qi =
True
1073 raise ValueError(f
"CDC QI Identifier not found: {cdc_identifier}")
1075 replace_cdc_qi =
False
1077 replace_cdc_qi =
False
1079 cdc_identifier = self.get_input_file_names(
1080 CDCQETeacherTask.get_weightfile_xml_identifier(
1081 CDCQETeacherTask, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option))[0]
1082 replace_cdc_qi =
True
1084 vxd_identifier =
'datafiles/' + \
1085 VXDQETeacherTask.get_weightfile_xml_identifier(VXDQETeacherTask, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option)
1086 if os.path.exists(vxd_identifier):
1087 replace_vxd_qi =
True
1089 raise ValueError(f
"VXD QI Identifier not found: {vxd_identifier}")
1091 replace_vxd_qi =
False
1093 replace_vxd_qi =
False
1095 vxd_identifier = self.get_input_file_names(
1096 VXDQETeacherTask.get_weightfile_xml_identifier(
1097 VXDQETeacherTask, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option))[0]
1098 replace_vxd_qi =
True
1100 cdc_qe_mva_filter_parameters =
None
1104 cut_index = self.
recotrack_optionrecotrack_option.find(
'deleteCDCQI') + len(
'deleteCDCQI')
1105 cut = int(self.
recotrack_optionrecotrack_option[cut_index:cut_index+3])/100.
1107 cdc_qe_mva_filter_parameters = {
1108 "identifier": cdc_identifier,
"cut": cut}
1110 cdc_qe_mva_filter_parameters = {
1112 elif replace_cdc_qi:
1113 cdc_qe_mva_filter_parameters = {
1114 "identifier": cdc_identifier}
1115 if cdc_qe_mva_filter_parameters
is not None:
1117 basf2.set_module_parameters(
1119 name=
"TFCDC_TrackQualityEstimator",
1120 filterParameters=cdc_qe_mva_filter_parameters,
1125 basf2.set_module_parameters(
1127 name=
"VXDQualityEstimatorMVA",
1128 WeightFileIdentifier=vxd_identifier)
1131 track_qe_module_name =
"TrackQualityEstimatorMVA"
1132 module_found =
False
1133 new_path = basf2.create_path()
1134 for module
in path.modules():
1135 if module.name() != track_qe_module_name:
1136 if not module.name ==
'TrackCreator':
1137 new_path.add_module(module)
1141 new_path.add_module(
1147 recoTrackColName=
'RecoTracks',
1148 trackColName=
'MDSTTracks')
1149 new_path.add_module(
1150 "TrackQETrainingDataCollector",
1151 TrainingDataOutputName=self.get_output_file_name(self.
get_records_file_nameget_records_file_name()),
1152 collectEventFeatures=
True,
1153 SVDPlusCDCStandaloneRecoTracksStoreArrayName=
"SVDPlusCDCStandaloneRecoTracks",
1156 if not module_found:
1157 raise KeyError(f
"No module {track_qe_module_name} found in path")
1164 A teacher task runs the basf2 mva teacher on the training data provided by a
1165 data collection task.
1167 Since teacher tasks are needed for all quality estimators covered by this
1168 steering file and the only thing that changes is the required data
1169 collection task and some training parameters, I decided to use inheritance
1170 and have the basic functionality in this base class/interface and have the
1171 specific teacher tasks inherit from it.
1174 n_events_training = b2luigi.IntParameter()
1176 experiment_number = b2luigi.IntParameter()
1180 process_type = b2luigi.Parameter(
1186 training_target = b2luigi.Parameter(
1193 exclude_variables = b2luigi.ListParameter(
1195 hashed=
True, default=[]
1199 fast_bdt_option = b2luigi.ListParameter(
1201 hashed=
True, default=[200, 8, 3, 0.1]
1208 Property defining the basename for the .xml and .root weightfiles that are created.
1209 Has to be implemented by the inheriting teacher task class.
1211 raise NotImplementedError(
1212 "Teacher Task must define a static weightfile_identifier"
1217 Name of the xml weightfile that is created by the teacher task.
1218 It is subsequently used as a local weightfile in the following validation tasks.
1220 if fast_bdt_option
is None:
1222 if recotrack_option
is None and hasattr(self,
'recotrack_option'):
1223 recotrack_option = self.recotrack_option
1225 recotrack_option =
''
1226 weightfile_details = create_fbdt_option_string(fast_bdt_option)
1228 if recotrack_option !=
'':
1229 weightfile_name = weightfile_name +
'_' + recotrack_option
1230 return weightfile_name +
".weights.xml"
1235 Property defining the name of the tree in the ROOT file from the
1236 ``data_collection_task`` that contains the recorded training data. Must
1237 implemented by the inheriting specific teacher task class.
1239 raise NotImplementedError(
"Teacher Task must define a static tree_name")
1244 Property defining random seed to be used by the ``GenerateSimTask``.
1245 Should differ from the random seed in the test data samples. Must
1246 implemented by the inheriting specific teacher task class.
1248 raise NotImplementedError(
"Teacher Task must define a static random seed")
1253 Property defining the specific ``DataCollectionTask`` to require. Must
1254 implemented by the inheriting specific teacher task class.
1256 raise NotImplementedError(
1257 "Teacher Task must define a data collection task to require "
1262 Generate list of luigi Tasks that this Task depends on.
1270 filename=
'datafiles/qe_records_N' + str(self.
n_events_trainingn_events_training) +
'_' + process +
'_' + self.
random_seedrandom_seed +
'.root',
1274 num_processes=MasterTask.num_processes,
1282 Generate list of output files that the task should produce.
1283 The task is considered finished if and only if the outputs all exist.
1289 Use basf2_mva teacher to create MVA weightfile from collected training
1292 This is the main process that is dispatched by the ``run`` method that
1293 is inherited from ``Basf2Task``.
1300 records_files = [
'datafiles/qe_records_N' + str(self.
n_events_trainingn_events_training) +
1301 '_' + process +
'_' + self.
random_seedrandom_seed +
'.root']
1303 if hasattr(self,
'recotrack_option'):
1304 records_files = self.get_input_file_names(
1309 recotrack_option=self.recotrack_option))
1311 records_files = self.get_input_file_names(
1317 my_basf2_mva_teacher(
1318 records_files=records_files,
1329 Task to run basf2 mva teacher on collected data for VXDTF2 track quality estimator
1332 weightfile_identifier_basename =
"vxdtf2_mva_qe"
1337 random_seed =
"train_vxd"
1340 data_collection_task = VXDQEDataCollectionTask
1345 Task to run basf2 mva teacher on collected data for CDC track quality estimator
1348 weightfile_identifier_basename =
"cdc_mva_qe"
1351 tree_name =
"records"
1353 random_seed =
"train_cdc"
1356 data_collection_task = CDCQEDataCollectionTask
1361 Task to run basf2 mva teacher on collected data for the final, combined
1362 track quality estimator
1367 recotrack_option = b2luigi.Parameter(
1369 default=
'deleteCDCQI080'
1374 weightfile_identifier_basename =
"recotrack_mva_qe"
1379 random_seed =
"train_rec"
1382 data_collection_task = RecoTrackQEDataCollectionTask
1384 cdc_training_target = b2luigi.Parameter()
1388 Generate list of luigi Tasks that this Task depends on.
1401 num_processes=MasterTask.num_processes,
1404 random_seed=self.
process_typeprocess_type +
'_' + self.random_seed,
1405 recotrack_option=self.recotrack_option,
1412 Run track reconstruction with MVA quality estimator and write out
1413 (="harvest") a root file with variables useful for the validation.
1417 n_events_testing = b2luigi.IntParameter()
1419 n_events_training = b2luigi.IntParameter()
1421 experiment_number = b2luigi.IntParameter()
1425 process_type = b2luigi.Parameter(
1432 exclude_variables = b2luigi.ListParameter(
1438 fast_bdt_option = b2luigi.ListParameter(
1440 hashed=
True, default=[200, 8, 3, 0.1]
1444 validation_output_file_name =
"harvesting_validation.root"
1446 reco_output_file_name =
"reconstruction.root"
1453 Teacher task to require to provide a quality estimator weightfile for ``add_tracking_with_quality_estimation``
1455 raise NotImplementedError()
1459 Add modules for track reconstruction to basf2 path that are to be
1460 validated. Besides track finding it should include MC matching, fitted
1461 track creation and a quality estimator module.
1463 raise NotImplementedError()
1467 Generate list of luigi Tasks that this Task depends on.
1482 filename=
'datafiles/generated_mc_N' + str(self.
n_events_testingn_events_testing) +
'_' + process +
'_test.root'
1486 bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.
experiment_numberexperiment_number],
1494 Generate list of output files that the task should produce.
1495 The task is considered finished if and only if the outputs all exist.
1502 Create a basf2 path that uses ``add_tracking_with_quality_estimation()``
1503 and adds the ``CombinedTrackingValidationModule`` to write out variables
1507 path = basf2.create_path()
1513 inputFileNames = [
'datafiles/generated_mc_N' + str(self.
n_events_testingn_events_testing) +
'_' + process +
'_test.root']
1515 inputFileNames = self.get_input_file_names(GenerateSimTask.output_file_name(
1519 inputFileNames=inputFileNames,
1521 path.add_module(
"Gearbox")
1522 tracking.add_geometry_modules(path)
1523 tracking.add_hit_preparation_modules(path)
1532 output_file_name=self.get_output_file_name(
1546 Run VXDTF2 track reconstruction and write out (="harvest") a root file with
1547 variables useful for validation of the VXD Quality Estimator.
1551 validation_output_file_name =
"vxd_qe_harvesting_validation.root"
1553 reco_output_file_name =
"vxd_qe_reconstruction.root"
1555 teacher_task = VXDQETeacherTask
1559 Add modules for VXDTF2 tracking with VXD quality estimator to basf2 path.
1561 tracking.add_vxd_track_finding_vxdtf2(
1564 reco_tracks=
"RecoTracks",
1565 add_mva_quality_indicator=
True,
1569 basf2.set_module_parameters(
1571 name=
"VXDQualityEstimatorMVA",
1572 WeightFileIdentifier=self.get_input_file_names(
1576 tracking.add_mc_matcher(path, components=[
"SVD"])
1577 tracking.add_track_fit_and_track_creator(path, components=[
"SVD"])
1582 Run CDC reconstruction and write out (="harvest") a root file with variables
1583 useful for validation of the CDC Quality Estimator.
1586 training_target = b2luigi.Parameter()
1588 validation_output_file_name =
"cdc_qe_harvesting_validation.root"
1590 reco_output_file_name =
"cdc_qe_reconstruction.root"
1592 teacher_task = CDCQETeacherTask
1597 Generate list of luigi Tasks that this Task depends on.
1613 filename=
'datafiles/generated_mc_N' + str(self.
n_events_testingn_events_testing) +
'_' + process +
'_test.root'
1617 bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.
experiment_numberexperiment_number],
1625 Add modules for CDC standalone tracking with CDC quality estimator to basf2 path.
1627 tracking.add_cdc_track_finding(
1629 output_reco_tracks=
"RecoTracks",
1630 add_mva_quality_indicator=
True,
1633 cdc_qe_mva_filter_parameters = {
1634 "identifier": self.get_input_file_names(
1635 CDCQETeacherTask.get_weightfile_xml_identifier(
1638 basf2.set_module_parameters(
1640 name=
"TFCDC_TrackQualityEstimator",
1641 filterParameters=cdc_qe_mva_filter_parameters,
1643 tracking.add_mc_matcher(path, components=[
"CDC"])
1644 tracking.add_track_fit_and_track_creator(path, components=[
"CDC"])
1649 Run track reconstruction and write out (="harvest") a root file with variables
1650 useful for validation of the MVA track Quality Estimator.
1653 cdc_training_target = b2luigi.Parameter()
1655 validation_output_file_name =
"reco_qe_harvesting_validation.root"
1657 reco_output_file_name =
"reco_qe_reconstruction.root"
1659 teacher_task = RecoTrackQETeacherTask
1663 Generate list of luigi Tasks that this Task depends on.
1670 exclude_variables=MasterTask.exclude_variables_cdc,
1677 exclude_variables=MasterTask.exclude_variables_vxd,
1695 filename=
'datafiles/generated_mc_N' + str(self.
n_events_testingn_events_testing) +
'_' + process +
'_test.root'
1699 bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.
experiment_numberexperiment_number],
1707 Add modules for reco tracking with all track quality estimators to basf2 path.
1711 tracking.add_tracking_reconstruction(
1713 add_cdcTrack_QI=
True,
1714 add_vxdTrack_QI=
True,
1715 add_recoTrack_QI=
True,
1716 skipGeometryAdding=
True,
1717 skipHitPreparerAdding=
False,
1722 cdc_qe_mva_filter_parameters = {
1723 "identifier": self.get_input_file_names(
1724 CDCQETeacherTask.get_weightfile_xml_identifier(
1727 basf2.set_module_parameters(
1729 name=
"TFCDC_TrackQualityEstimator",
1730 filterParameters=cdc_qe_mva_filter_parameters,
1732 basf2.set_module_parameters(
1734 name=
"VXDQualityEstimatorMVA",
1735 WeightFileIdentifier=self.get_input_file_names(
1736 VXDQETeacherTask.get_weightfile_xml_identifier(VXDQETeacherTask, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option)
1739 basf2.set_module_parameters(
1741 name=
"TrackQualityEstimatorMVA",
1742 WeightFileIdentifier=self.get_input_file_names(
1743 RecoTrackQETeacherTask.get_weightfile_xml_identifier(RecoTrackQETeacherTask, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option)
1750 Base class for evaluating a quality estimator ``basf2_mva_evaluate.py`` on a
1751 separate test data set.
1753 Evaluation tasks for VXD, CDC and combined QE can inherit from it.
1761 git_hash = b2luigi.Parameter(
1763 default=get_basf2_git_hash()
1767 n_events_testing = b2luigi.IntParameter()
1769 n_events_training = b2luigi.IntParameter()
1771 experiment_number = b2luigi.IntParameter()
1775 process_type = b2luigi.Parameter(
1781 training_target = b2luigi.Parameter(
1788 exclude_variables = b2luigi.ListParameter(
1794 fast_bdt_option = b2luigi.ListParameter(
1796 hashed=
True, default=[200, 8, 3, 0.1]
1803 Property defining specific teacher task to require.
1805 raise NotImplementedError(
1806 "Evaluation Tasks must define a teacher task to require "
1812 Property defining the specific ``DataCollectionTask`` to require. Must
1813 implemented by the inheriting specific teacher task class.
1815 raise NotImplementedError(
1816 "Evaluation Tasks must define a data collection task to require "
1822 Acronym to distinguish between cdc, vxd and rec(o) MVA
1824 raise NotImplementedError(
1825 "Evaluation Tasks must define a task acronym."
1830 Generate list of luigi Tasks that this Task depends on.
1846 filename=
'datafiles/qe_records_N' + str(self.
n_events_testingn_events_testing) +
'_' + process +
'_test_' +
1851 num_processes=MasterTask.num_processes,
1859 Generate list of output files that the task should produce.
1860 The task is considered finished if and only if the outputs all exist.
1862 weightfile_details = create_fbdt_option_string(self.
fast_bdt_optionfast_bdt_option)
1863 evaluation_pdf_output = self.
teacher_taskteacher_task.weightfile_identifier_basename + weightfile_details +
".pdf"
1864 yield self.add_to_output(evaluation_pdf_output)
1866 @b2luigi.on_temporary_files
1869 Run ``basf2_mva_evaluate.py`` subprocess to evaluate QE MVA.
1871 The MVA weight file created from training on the training data set is
1872 evaluated on separate test data.
1874 weightfile_details = create_fbdt_option_string(self.
fast_bdt_optionfast_bdt_option)
1875 evaluation_pdf_output_basename = self.
teacher_taskteacher_task.weightfile_identifier_basename + weightfile_details +
".pdf"
1877 evaluation_pdf_output_path = self.get_output_file_name(evaluation_pdf_output_basename)
1884 datafiles =
'datafiles/qe_records_N' + str(self.
n_events_testingn_events_testing) +
'_' + \
1885 process +
'_test_' + self.
task_acronymtask_acronym +
'.root'
1887 datafiles = self.get_input_file_names(
1891 random_seed=self.process +
'_test_' +
1894 "basf2_mva_evaluate.py",
1896 self.get_input_file_names(
1897 self.
teacher_taskteacher_task.get_weightfile_xml_identifier(
1905 evaluation_pdf_output_path,
1909 log_file_dir = get_log_file_dir(self)
1913 os.makedirs(log_file_dir, exist_ok=
True)
1916 except FileExistsError:
1917 print(
'Directory ' + log_file_dir +
'already exists.')
1918 stderr_log_file_path = log_file_dir +
"stderr"
1919 stdout_log_file_path = log_file_dir +
"stdout"
1920 with open(stdout_log_file_path,
"w")
as stdout_file:
1921 stdout_file.write(
"stdout output of the command:\n{}\n\n".format(
" ".join(cmd)))
1922 if os.path.exists(stderr_log_file_path):
1924 os.remove(stderr_log_file_path)
1927 with open(stdout_log_file_path,
"a")
as stdout_file:
1928 with open(stderr_log_file_path,
"a")
as stderr_file:
1930 subprocess.run(cmd, check=
True, stdin=stdout_file, stderr=stderr_file)
1931 except subprocess.CalledProcessError
as err:
1932 stderr_file.write(f
"Evaluation failed with error:\n{err}")
1938 Run ``basf2_mva_evaluate.py`` for the VXD quality estimator on separate test data
1942 teacher_task = VXDQETeacherTask
1945 data_collection_task = VXDQEDataCollectionTask
1948 task_acronym =
'vxd'
1953 Run ``basf2_mva_evaluate.py`` for the CDC quality estimator on separate test data
1957 teacher_task = CDCQETeacherTask
1960 data_collection_task = CDCQEDataCollectionTask
1963 task_acronym =
'cdc'
1968 Run ``basf2_mva_evaluate.py`` for the final, combined quality estimator on
1973 teacher_task = RecoTrackQETeacherTask
1976 data_collection_task = RecoTrackQEDataCollectionTask
1979 task_acronym =
'rec'
1981 cdc_training_target = b2luigi.Parameter()
1985 Generate list of luigi Tasks that this Task depends on.
2002 filename=
'datafiles/qe_records_N' + str(self.
n_events_testingn_events_testing) +
'_' + process +
'_test_' +
2003 self.task_acronym +
'.root'
2006 yield self.data_collection_task(
2007 num_processes=MasterTask.num_processes,
2011 cdc_training_target=self.cdc_training_target,
2017 Create a PDF file with validation plots for a quality estimator produced
2018 from the ROOT ntuples produced by a harvesting validation task
2021 n_events_testing = b2luigi.IntParameter()
2023 n_events_training = b2luigi.IntParameter()
2025 experiment_number = b2luigi.IntParameter()
2029 process_type = b2luigi.Parameter(
2036 exclude_variables = b2luigi.ListParameter(
2042 fast_bdt_option = b2luigi.ListParameter(
2044 hashed=
True, default=[200, 8, 3, 0.1]
2048 primaries_only = b2luigi.BoolParameter(
2057 Specifies related harvesting validation task which produces the ROOT
2058 files with the data that is plotted by this task.
2060 raise NotImplementedError(
"Must define a QI harvesting validation task for which to do the plots")
2065 Name of the output PDF file containing the validation plots
2068 return validation_harvest_basename.replace(
".root",
"_plots.pdf")
2072 Generate list of luigi Tasks that this Task depends on.
2078 Generate list of output files that the task should produce.
2079 The task is considered finished if and only if the outputs all exist.
2083 @b2luigi.on_temporary_files
2086 Use basf2_mva teacher to create MVA weightfile from collected training
2089 Main process that is dispatched by the ``run`` method that is inherited
2094 validation_harvest_path = self.get_input_file_names(validation_harvest_basename)[0]
2098 'is_fake',
'is_clone',
'is_matched',
'quality_indicator',
2099 'experiment_number',
'run_number',
'event_number',
'pr_store_array_number',
2100 'pt_estimate',
'z0_estimate',
'd0_estimate',
'tan_lambda_estimate',
2101 'phi0_estimate',
'pt_truth',
'z0_truth',
'd0_truth',
'tan_lambda_truth',
2105 pr_df = uproot.open(validation_harvest_path)[
'pr_tree/pr_tree'].arrays(pr_columns, library=
'pd')
2107 'experiment_number',
2110 'pr_store_array_number',
2115 mc_df = uproot.open(validation_harvest_path)[
'mc_tree/mc_tree'].arrays(mc_columns, library=
'pd')
2117 mc_df = mc_df[mc_df.is_primary.eq(
True)]
2120 qi_cuts = np.linspace(0., 1, 20, endpoint=
False)
2127 with PdfPages(output_pdf_file_path, keep_empty=
False)
as pdf:
2132 titlepage_fig, titlepage_ax = plt.subplots()
2133 titlepage_ax.axis(
"off")
2134 title = f
"Quality Estimator validation plots from {self.__class__.__name__}"
2135 titlepage_ax.set_title(title)
2137 weightfile_identifier = teacher_task.get_weightfile_xml_identifier(teacher_task, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option)
2139 "Date": datetime.today().strftime(
"%Y-%m-%d %H:%M"),
2140 "Created by steering file": os.path.realpath(__file__),
2141 "Created from data in": validation_harvest_path,
2142 "Background directory": MasterTask.bkgfiles_by_exp[self.
experiment_numberexperiment_number],
2143 "weight file": weightfile_identifier,
2145 if hasattr(self,
'exclude_variables'):
2146 meta_data[
"Excluded variables"] =
", ".join(self.
exclude_variablesexclude_variables)
2147 meta_data_string = (format_dictionary(meta_data) +
2148 "\n\n(For all MVA training parameters look into the produced weight file)")
2149 luigi_params = get_serialized_parameters(self)
2150 luigi_param_string = (f
"\n\nb2luigi parameters for {self.__class__.__name__}\n" +
2151 format_dictionary(luigi_params))
2152 title_page_text = meta_data_string + luigi_param_string
2153 titlepage_ax.text(0, 1, title_page_text, ha=
"left", va=
"top", wrap=
True, fontsize=8)
2154 pdf.savefig(titlepage_fig)
2155 plt.close(titlepage_fig)
2157 fake_rates = get_uncertain_means_for_qi_cuts(pr_df,
"is_fake", qi_cuts)
2158 fake_fig, fake_ax = plt.subplots()
2159 fake_ax.set_title(
"Fake rate")
2160 plot_with_errobands(fake_rates, ax=fake_ax)
2161 fake_ax.set_ylabel(
"fake rate")
2162 fake_ax.set_xlabel(
"quality indicator requirement")
2163 pdf.savefig(fake_fig, bbox_inches=
"tight")
2167 clone_rates = get_uncertain_means_for_qi_cuts(pr_df,
"is_clone", qi_cuts)
2168 clone_fig, clone_ax = plt.subplots()
2169 clone_ax.set_title(
"Clone rate")
2170 plot_with_errobands(clone_rates, ax=clone_ax)
2171 clone_ax.set_ylabel(
"clone rate")
2172 clone_ax.set_xlabel(
"quality indicator requirement")
2173 pdf.savefig(clone_fig, bbox_inches=
"tight")
2174 plt.close(clone_fig)
2181 pr_track_identifiers = [
'experiment_number',
'run_number',
'event_number',
'pr_store_array_number']
2183 left=mc_df, right=pr_df[pr_track_identifiers + [
'quality_indicator']],
2185 on=pr_track_identifiers
2188 missing_fractions = (
2189 _my_uncertain_mean(mc_df[
2190 mc_df.quality_indicator.isnull() | (mc_df.quality_indicator > qi_cut)][
'is_missing'])
2191 for qi_cut
in qi_cuts
2194 findeff_fig, findeff_ax = plt.subplots()
2195 findeff_ax.set_title(
"Finding efficiency")
2196 finding_efficiencies = 1.0 - upd.Series(data=missing_fractions, index=qi_cuts)
2197 plot_with_errobands(finding_efficiencies, ax=findeff_ax)
2198 findeff_ax.set_ylabel(
"finding efficiency")
2199 findeff_ax.set_xlabel(
"quality indicator requirement")
2200 pdf.savefig(findeff_fig, bbox_inches=
"tight")
2201 plt.close(findeff_fig)
2206 fake_roc_fig, fake_roc_ax = plt.subplots()
2207 fake_roc_ax.set_title(
"Fake rate vs. finding efficiency ROC curve")
2208 fake_roc_ax.errorbar(x=finding_efficiencies.nominal_value, y=fake_rates.nominal_value,
2209 xerr=finding_efficiencies.std_dev, yerr=fake_rates.std_dev, elinewidth=0.8)
2210 fake_roc_ax.set_xlabel(
'finding efficiency')
2211 fake_roc_ax.set_ylabel(
'fake rate')
2212 pdf.savefig(fake_roc_fig, bbox_inches=
"tight")
2213 plt.close(fake_roc_fig)
2216 clone_roc_fig, clone_roc_ax = plt.subplots()
2217 clone_roc_ax.set_title(
"Clone rate vs. finding efficiency ROC curve")
2218 clone_roc_ax.errorbar(x=finding_efficiencies.nominal_value, y=clone_rates.nominal_value,
2219 xerr=finding_efficiencies.std_dev, yerr=clone_rates.std_dev, elinewidth=0.8)
2220 clone_roc_ax.set_xlabel(
'finding efficiency')
2221 clone_roc_ax.set_ylabel(
'clone rate')
2222 pdf.savefig(clone_roc_fig, bbox_inches=
"tight")
2223 plt.close(clone_roc_fig)
2228 kinematic_qi_cuts = [0, 0.5, 0.9]
2232 params = [
'd0',
'z0',
'pt',
'tan_lambda',
'phi0']
2237 "tan_lambda":
r"$\tan{\lambda}$",
2244 "tan_lambda":
"rad",
2247 n_kinematic_bins = 75
2249 "pt": np.linspace(0, np.percentile(pr_df[
'pt_truth'].dropna(), 95), n_kinematic_bins),
2250 "z0": np.linspace(-0.1, 0.1, n_kinematic_bins),
2251 "d0": np.linspace(0, 0.01, n_kinematic_bins),
2252 "tan_lambda": np.linspace(-2, 3, n_kinematic_bins),
2253 "phi0": np.linspace(0, 2 * np.pi, n_kinematic_bins)
2257 kinematic_qi_cuts = [0, 0.5, 0.8]
2258 blue, yellow, green = plt.get_cmap(
"tab10").colors[0:3]
2259 for param
in params:
2260 fig, axarr = plt.subplots(ncols=len(kinematic_qi_cuts), sharey=
True, sharex=
True, figsize=(14, 6))
2261 fig.suptitle(f
"{label_by_param[param]} distributions")
2262 for i, qi
in enumerate(kinematic_qi_cuts):
2264 ax.set_title(f
"QI > {qi}")
2265 incut = pr_df[(pr_df[
'quality_indicator'] > qi)]
2266 incut_matched = incut[incut.is_matched.eq(
True)]
2267 incut_clones = incut[incut.is_clone.eq(
True)]
2268 incut_fake = incut[incut.is_fake.eq(
True)]
2271 if any(series.empty
for series
in (incut, incut_matched, incut_clones, incut_fake)):
2272 ax.text(0.5, 0.5,
"Not enough data in bin", ha=
"center", va=
"center", transform=ax.transAxes)
2275 bins = bins_by_param[param]
2276 stacked_histogram_series_tuple = (
2277 incut_matched[f
'{param}_estimate'],
2278 incut_clones[f
'{param}_estimate'],
2279 incut_fake[f
'{param}_estimate'],
2281 histvals, _, _ = ax.hist(stacked_histogram_series_tuple,
2283 bins=bins, range=(bins.min(), bins.max()),
2284 color=(blue, green, yellow),
2285 label=(
"matched",
"clones",
"fakes"))
2286 ax.set_xlabel(f
'{label_by_param[param]} estimate / ({unit_by_param[param]})')
2287 ax.set_ylabel(
'# tracks')
2288 axarr[0].legend(loc=
"upper center", bbox_to_anchor=(0, -0.15))
2289 pdf.savefig(fig, bbox_inches=
"tight")
2295 Create a PDF file with validation plots for the VXDTF2 track quality
2296 estimator produced from the ROOT ntuples produced by a VXDTF2 track QE
2297 harvesting validation task
2303 Harvesting validation task to require, which produces the ROOT files
2304 with variables to produce the VXD QE validation plots.
2312 num_processes=MasterTask.num_processes,
2319 Create a PDF file with validation plots for the CDC track quality estimator
2320 produced from the ROOT ntuples produced by a CDC track QE harvesting
2324 training_target = b2luigi.Parameter()
2329 Harvesting validation task to require, which produces the ROOT files
2330 with variables to produce the CDC QE validation plots.
2337 training_target=self.training_target,
2339 num_processes=MasterTask.num_processes,
2346 Create a PDF file with validation plots for the reco MVA track quality
2347 estimator produced from the ROOT ntuples produced by a reco track QE
2348 harvesting validation task
2351 cdc_training_target = b2luigi.Parameter()
2354 def harvesting_validation_task_instance(self):
2356 Harvesting validation task to require, which produces the ROOT files
2357 with variables to produce the final MVA track QE validation plots.
2360 n_events_testing=self.n_events_testing,
2365 exclude_variables=self.exclude_variables,
2366 num_processes=MasterTask.num_processes,
2373 Collect weightfile identifiers from different teacher tasks and merge them
2374 into a local database for testing.
2377 n_events_training = b2luigi.IntParameter()
2379 experiment_number = b2luigi.IntParameter()
2383 process_type = b2luigi.Parameter(
2389 cdc_training_target = b2luigi.Parameter()
2391 fast_bdt_option = b2luigi.ListParameter(
2393 hashed=
True, default=[200, 8, 3, 0.1]
2399 Required teacher tasks
2405 exclude_variables=MasterTask.exclude_variables_vxd,
2413 exclude_variables=MasterTask.exclude_variables_cdc,
2421 exclude_variables=MasterTask.exclude_variables_rec,
2429 yield self.add_to_output(
"localdb.tar")
2433 Create local database
2435 current_path = Path.cwd()
2436 localdb_archive_path = Path(self.get_output_file_name(
"localdb.tar")).absolute()
2437 output_dir = localdb_archive_path.parent
2442 for task
in (VXDQETeacherTask, CDCQETeacherTask, RecoTrackQETeacherTask):
2444 weightfile_xml_identifier_path = os.path.abspath(self.get_input_file_names(
2445 task.get_weightfile_xml_identifier(task, fast_bdt_option=self.
fast_bdt_optionfast_bdt_option))[0])
2448 os.chdir(output_dir)
2451 weightfile_xml_identifier_path,
2452 task.weightfile_identifier_basename,
2457 os.chdir(current_path)
2460 shutil.make_archive(
2461 base_name=localdb_archive_path.as_posix().split(
'.')[0],
2463 root_dir=output_dir,
2470 Remove local database and tar archives in output directory
2472 localdb_archive_path = Path(self.get_output_file_name(
"localdb.tar"))
2473 localdb_path = localdb_archive_path.parent /
"localdb"
2475 if localdb_path.exists():
2476 print(f
"Deleting localdb\n{localdb_path}\nwith contents\n ",
2477 "\n ".join(f.name
for f
in localdb_path.iterdir()))
2478 shutil.rmtree(localdb_path, ignore_errors=
False)
2480 if localdb_archive_path.is_file():
2481 print(f
"Deleting {localdb_archive_path}")
2482 os.remove(localdb_archive_path)
2484 def on_failure(self, exception):
2486 Cleanup: Remove local database to prevent existing outputs when task did not finish successfully
2490 super().on_failure(exception)
2495 Wrapper task that needs to finish for b2luigi to finish running this steering file.
2497 It is done if the outputs of all required subtasks exist. It is thus at the
2498 top of the luigi task graph. Edit the ``requires`` method to steer which
2499 tasks and with which parameters you want to run.
2504 process_type = b2luigi.get_setting(
2506 "process_type", default=
'BBBAR'
2510 n_events_training = b2luigi.get_setting(
2512 "n_events_training", default=20000
2516 n_events_testing = b2luigi.get_setting(
2518 "n_events_testing", default=5000
2522 n_events_per_task = b2luigi.get_setting(
2524 "n_events_per_task", default=100
2528 num_processes = b2luigi.get_setting(
2530 "basf2_processes_per_worker", default=0
2534 datafiles = b2luigi.get_setting(
"datafiles")
2536 bkgfiles_by_exp = b2luigi.get_setting(
"bkgfiles_by_exp")
2538 bkgfiles_by_exp = {int(key): val
for (key, val)
in bkgfiles_by_exp.items()}
2540 exclude_variables_cdc = [
2541 "has_matching_segment",
2546 "cont_layer_variance",
2551 "cont_layer_max_vs_last",
2552 "cont_layer_first_vs_min",
2554 "cont_layer_occupancy",
2556 "super_layer_variance",
2557 "super_layer_max_vs_last",
2558 "super_layer_first_vs_min",
2559 "super_layer_occupancy",
2560 "drift_length_mean",
2561 "drift_length_variance",
2565 "norm_drift_length_mean",
2566 "norm_drift_length_variance",
2567 "norm_drift_length_max",
2568 "norm_drift_length_min",
2569 "norm_drift_length_sum",
2584 exclude_variables_vxd = [
2585 'energyLoss_max',
'energyLoss_min',
'energyLoss_mean',
'energyLoss_std',
'energyLoss_sum',
2586 'size_max',
'size_min',
'size_mean',
'size_std',
'size_sum',
2587 'seedCharge_max',
'seedCharge_min',
'seedCharge_mean',
'seedCharge_std',
'seedCharge_sum',
2588 'tripletFit_P_Mag',
'tripletFit_P_Eta',
'tripletFit_P_Phi',
'tripletFit_P_X',
'tripletFit_P_Y',
'tripletFit_P_Z']
2590 exclude_variables_rec = [
2602 'N_diff_PXD_SVD_RecoTracks',
2603 'N_diff_SVD_CDC_RecoTracks',
2605 'Fit_NFailedPoints',
2607 'N_TrackPoints_without_KalmanFitterInfo',
2608 'N_Hits_without_TrackPoint',
2609 'SVD_CDC_CDCwall_Chi2',
2610 'SVD_CDC_CDCwall_Pos_diff_Z',
2611 'SVD_CDC_CDCwall_Pos_diff_Pt',
2612 'SVD_CDC_CDCwall_Pos_diff_Theta',
2613 'SVD_CDC_CDCwall_Pos_diff_Phi',
2614 'SVD_CDC_CDCwall_Pos_diff_Mag',
2615 'SVD_CDC_CDCwall_Pos_diff_Eta',
2616 'SVD_CDC_CDCwall_Mom_diff_Z',
2617 'SVD_CDC_CDCwall_Mom_diff_Pt',
2618 'SVD_CDC_CDCwall_Mom_diff_Theta',
2619 'SVD_CDC_CDCwall_Mom_diff_Phi',
2620 'SVD_CDC_CDCwall_Mom_diff_Mag',
2621 'SVD_CDC_CDCwall_Mom_diff_Eta',
2622 'SVD_CDC_POCA_Pos_diff_Z',
2623 'SVD_CDC_POCA_Pos_diff_Pt',
2624 'SVD_CDC_POCA_Pos_diff_Theta',
2625 'SVD_CDC_POCA_Pos_diff_Phi',
2626 'SVD_CDC_POCA_Pos_diff_Mag',
2627 'SVD_CDC_POCA_Pos_diff_Eta',
2628 'SVD_CDC_POCA_Mom_diff_Z',
2629 'SVD_CDC_POCA_Mom_diff_Pt',
2630 'SVD_CDC_POCA_Mom_diff_Theta',
2631 'SVD_CDC_POCA_Mom_diff_Phi',
2632 'SVD_CDC_POCA_Mom_diff_Mag',
2633 'SVD_CDC_POCA_Mom_diff_Eta',
2640 'SVD_FitSuccessful',
2641 'CDC_FitSuccessful',
2644 'is_Vzero_Daughter',
2656 'weight_firstCDCHit',
2657 'weight_lastSVDHit',
2660 'smoothedChi2_mean',
2662 'smoothedChi2_median',
2663 'smoothedChi2_n_zeros',
2664 'smoothedChi2_firstCDCHit',
2665 'smoothedChi2_lastSVDHit']
2669 Generate list of tasks that needs to be done for luigi to finish running
2672 cdc_training_targets = [
2677 fast_bdt_options = []
2686 fast_bdt_options.append([350, 6, 5, 0.1])
2688 experiment_numbers = b2luigi.get_setting(
"experiment_numbers")
2691 for experiment_number, cdc_training_target, fast_bdt_option
in itertools.product(
2692 experiment_numbers, cdc_training_targets, fast_bdt_options
2695 if b2luigi.get_setting(
"test_selected_task", default=
False):
2698 for cut
in [
'000',
'070',
'090',
'095']:
2702 experiment_number=experiment_number,
2704 recotrack_option=
'useCDC_noVXD_deleteCDCQI'+cut,
2705 cdc_training_target=cdc_training_target,
2706 fast_bdt_option=fast_bdt_option,
2711 experiment_number=experiment_number,
2717 experiment_number=experiment_number,
2719 training_target=cdc_training_target,
2720 fast_bdt_option=fast_bdt_option,
2728 experiment_number=experiment_number,
2734 experiment_number=experiment_number,
2740 experiment_number=experiment_number,
2742 recotrack_option=
'deleteCDCQI080',
2743 cdc_training_target=cdc_training_target,
2744 fast_bdt_option=fast_bdt_option,
2750 experiment_number=experiment_number,
2751 cdc_training_target=cdc_training_target,
2752 fast_bdt_option=fast_bdt_option,
2755 if b2luigi.get_setting(
"run_validation_tasks", default=
True):
2760 experiment_number=experiment_number,
2761 cdc_training_target=cdc_training_target,
2763 fast_bdt_option=fast_bdt_option,
2769 experiment_number=experiment_number,
2771 training_target=cdc_training_target,
2772 fast_bdt_option=fast_bdt_option,
2779 experiment_number=experiment_number,
2780 fast_bdt_option=fast_bdt_option,
2783 if b2luigi.get_setting(
"run_mva_evaluate", default=
True):
2790 experiment_number=experiment_number,
2791 cdc_training_target=cdc_training_target,
2793 fast_bdt_option=fast_bdt_option,
2799 experiment_number=experiment_number,
2801 fast_bdt_option=fast_bdt_option,
2802 training_target=cdc_training_target,
2808 experiment_number=experiment_number,
2810 fast_bdt_option=fast_bdt_option,
2814 if __name__ ==
"__main__":
2817 nEventsTestOnData = b2luigi.get_setting(
"n_events_test_on_data", default=-1)
2818 if nEventsTestOnData > 0
and 'DATA' in b2luigi.get_setting(
"process_type", default=
"BBBAR"):
2819 from ROOT
import Belle2
2821 environment.setNumberEventsOverride(nEventsTestOnData)
2824 globaltags = b2luigi.get_setting(
"globaltags", default=[])
2825 if len(globaltags) > 0:
2826 basf2.conditions.reset()
2827 for gt
in globaltags:
2828 basf2.conditions.prepend_globaltag(gt)
2829 workers = b2luigi.get_setting(
"workers", default=1)
2830 b2luigi.process(
MasterTask(), workers=workers)
def get_background_files(folder=None, output_file_info=True)
static Environment & Instance()
Static method to get a reference to the Environment instance.
def get_records_file_name(self, n_events=None, random_seed=None)
Filename of the recorded/collected data for the final QE MVA training.
experiment_number
Experiment number of the conditions database, e.g.
n_events
Number of events to generate.
def get_input_files(self, n_events=None, random_seed=None)
random_seed
Random basf2 seed used by the GenerateSimTask.
def add_tracking_with_quality_estimation(self, path)
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
teacher_task
Teacher task to require to provide a quality estimator weightfile for add_tracking_with_quality_estim...
def harvesting_validation_task_instance(self)
filename
filename to check
experiment_number
Experiment number of the conditions database, e.g.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
random_seed
Random basf2 seed.
process_type
Define which kind of process shall be used.
TrackQETeacherBaseTask teacher_task(self)
experiment_number
Experiment number of the conditions database, e.g.
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
string validation_output_file_name
Name of the "harvested" ROOT output file with variables that can be used for validation.
None add_tracking_with_quality_estimation(self, basf2.Path path)
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
string reco_output_file_name
Name of the output of the RootOutput module with reconstructed events.
process_type
Define which kind of process shall be used.
list exclude_variables_rec
list of variables to exclude for the recotrack mva:
list exclude_variables_vxd
list of variables to exclude for the vxd mva:
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
list exclude_variables_cdc
list of variables to exclude for the cdc mva.
num_processes
Number of basf2 processes to use in Basf2PathTasks.
process_type
Define which kind of process shall be used.
experiment_number
Experiment number of the conditions database, e.g.
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
n_events_training
Number of events to generate for the training data set.
def output_pdf_file_basename(self)
n_events_testing
Number of events to generate for the test data set.
primaries_only
Whether to normalize the track finding efficiencies to primary particles only.
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
HarvestingValidationBaseTask harvesting_validation_task_instance(self)
process_type
Define which kind of process shall be used.
cdc_training_target
Feature/vaiable to use as truth label for the CDC track quality estimator.
experiment_number
Experiment number of the conditions database, e.g.
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
n_events_training
Number of events to generate for the training data set.
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
experiment_number
Experiment number of the conditions database, e.g.
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
recotrack_option
RecoTrack option, use string that is additive: deleteCDCQI0XY (= deletes CDCTracks with CDC-QI below ...
def get_records_file_name(self, n_events=None, random_seed=None, recotrack_option=None)
Filename of the recorded/collected data for the final QE MVA training.
n_events
Number of events to generate.
def get_input_files(self, n_events=None, random_seed=None)
random_seed
Random basf2 seed used by the GenerateSimTask.
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
teacher_task
Task that is required by the evaluation base class to create the MVA weightfile that needs to be eval...
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
def add_tracking_with_quality_estimation(self, path)
teacher_task
Teacher task to require to provide a quality estimator weightfile for add_tracking_with_quality_estim...
cdc_training_target
Feature/variable to use as truth label for the CDC track quality estimator.
data_collection_task
Defines DataCollectionTask to require by the base class to collect features for the MVA training.
string random_seed
Random basf2 seed used to create the training data set.
def harvesting_validation_task_instance(self)
experiment_number
Experiment number of the conditions database, e.g.
def output_file_name(self, n_events=None, random_seed=None)
Name of the ROOT output file with generated and simulated events.
n_events
Number of events to generate.
bkgfiles_dir
Directory with overlay background root files.
random_seed
Random basf2 seed.
process_type
Define which kind of process shall be used.
TrackQETeacherBaseTask teacher_task(self)
experiment_number
Experiment number of the conditions database, e.g.
fast_bdt_option
Hyperparameter options for the FastBDT algorithm.
n_events_training
Number of events to generate for the training data set.
n_events_testing
Number of events to generate for the test data set.
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
Basf2PathTask data_collection_task(self)
process_type
Define which kind of process shall be used.
experiment_number
Experiment number of the conditions database, e.g.
fast_bdt_option
Hyperparameter option of the FastBDT algorithm.
def weightfile_identifier_basename(self)
n_events_training
Number of events to generate for the training data set.
def get_weightfile_xml_identifier(self, fast_bdt_option=None, recotrack_option=None)
exclude_variables
List of collected variables to not use in the training of the QE MVA classifier.
training_target
Feature/variable to use as truth label in the quality estimator MVA classifier.
Basf2PathTask data_collection_task(self)
def get_records_file_name(self, n_events=None, random_seed=None)
Filename of the recorded/collected data for the final QE MVA training.
experiment_number
Experiment number of the conditions database, e.g.
n_events
Number of events to generate.
def get_input_files(self, n_events=None, random_seed=None)
random_seed
Random basf2 seed used by the GenerateSimTask.
def add_tracking_with_quality_estimation(self, path)
string validation_output_file_name
Name of the "harvested" ROOT output file with variables that can be used for validation.
string reco_output_file_name
Name of the output of the RootOutput module with reconstructed events.
string tree_name
Name of the TTree in the ROOT file from the data_collection_task that contains the training data for ...
def add_simulation(path, components=None, bkgfiles=None, bkgOverlay=True, forceSetPXDDataReduction=False, usePXDDataReduction=True, cleanupPXDDataReduction=True, generate_2nd_cdc_hits=False, simulateT0jitter=True, isCosmics=False, FilterEvents=False, usePXDGatedMode=False, skipExperimentCheckForBG=False, save_slow_pions_in_mc=False)