Belle II Software development
RecoTrackQEDataCollectionTask Class Reference
Inheritance diagram for RecoTrackQEDataCollectionTask:

Public Member Functions

def get_records_file_name (self, n_events=None, random_seed=None, recotrack_option=None)
 Filename of the recorded/collected data for the final QE MVA training.
 
def get_input_files (self, n_events=None, random_seed=None)
 
def requires (self)
 
def output (self)
 
def create_path (self)
 

Static Public Attributes

b2luigi n_events = b2luigi.IntParameter()
 Number of events to generate.
 
b2luigi experiment_number = b2luigi.IntParameter()
 Experiment number of the conditions database, e.g.
 
b2luigi random_seed = b2luigi.Parameter()
 Random basf2 seed used by the GenerateSimTask.
 
b2luigi cdc_training_target = b2luigi.Parameter()
 Feature/variable to use as truth label for the CDC track quality estimator.
 
b2luigi recotrack_option
 RecoTrack option, use string that is additive: deleteCDCQI0XY (= deletes CDCTracks with CDC-QI below 0.XY), useCDC (= uses trained CDC stored in datafiles/), useVXD (uses trained VXD stored in datafiles/), noVXD (= doesn't use the VXD MVA at all)
 
b2luigi fast_bdt_option
 Hyperparameter option of the FastBDT algorithm.
 
str queue = 'l'
 specify queue.
 

Detailed Description

Collect variables/features from the reco track reconstruction including the
fit and write them to a ROOT file.

These variables are to be used as labelled training data for the MVA
classifier which is the MVA track quality estimator.  The collected
variables include the classifier outputs from the VXD and CDC quality
estimators, namely the CDC and VXD quality indicators, combined with fit,
merger, timing, energy loss information etc.  This task requires the
subdetector quality estimators to be trained.

Definition at line 911 of file combined_quality_estimator_teacher.py.

Member Function Documentation

◆ create_path()

def create_path (   self)
Create basf2 reconstruction path that should mirror the default path
from ``add_tracking_reconstruction()``, but with modules for the VXD QE
and CDC QE application and for collection of variables for the reco
track quality estimator.

Definition at line 1038 of file combined_quality_estimator_teacher.py.

1038 def create_path(self):
1039 """
1040 Create basf2 reconstruction path that should mirror the default path
1041 from ``add_tracking_reconstruction()``, but with modules for the VXD QE
1042 and CDC QE application and for collection of variables for the reco
1043 track quality estimator.
1044 """
1045 path = basf2.create_path()
1046 inputFileNames = self.get_input_files()
1047 path.add_module(
1048 "RootInput",
1049 inputFileNames=inputFileNames,
1050 )
1051 path.add_module("Gearbox")
1052
1053 # First add tracking reconstruction with default quality estimation modules
1054 mvaCDC = True
1055 mvaVXD = True
1056 if 'noCDC' in self.recotrack_option:
1057 mvaCDC = False
1058 if 'noVXD' in self.recotrack_option:
1059 mvaVXD = False
1060 if 'DATA' in self.random_seed:
1061 from rawdata import add_unpackers
1062 add_unpackers(path)
1063 tracking.add_tracking_reconstruction(path, add_cdcTrack_QI=mvaCDC, add_vxdTrack_QI=mvaVXD, add_recoTrack_QI=True)
1064
1065 # if data shall be processed check if newly trained mva files are available. Otherwise use default ones (CDB payloads):
1066 # if useCDC/VXD is specified, use the identifier lying in datafiles/ Otherwise, replace weightfile identifiers from defaults
1067 # (CDB payloads) to new weightfiles created by this b2luigi script
1068 if ('DATA' in self.random_seed or 'useCDC' in self.recotrack_option) and 'noCDC' not in self.recotrack_option:
1069 cdc_identifier = 'datafiles/' + \
1070 CDCQETeacherTask.get_weightfile_xml_identifier(CDCQETeacherTask, fast_bdt_option=self.fast_bdt_option)
1071 if os.path.exists(cdc_identifier):
1072 replace_cdc_qi = True
1073 elif 'useCDC' in self.recotrack_option:
1074 raise ValueError(f"CDC QI Identifier not found: {cdc_identifier}")
1075 else:
1076 replace_cdc_qi = False
1077 elif 'noCDC' in self.recotrack_option:
1078 replace_cdc_qi = False
1079 else:
1080 cdc_identifier = self.get_input_file_names(
1081 CDCQETeacherTask.get_weightfile_xml_identifier(
1082 CDCQETeacherTask, fast_bdt_option=self.fast_bdt_option))[0]
1083 replace_cdc_qi = True
1084 if ('DATA' in self.random_seed or 'useVXD' in self.recotrack_option) and 'noVXD' not in self.recotrack_option:
1085 vxd_identifier = 'datafiles/' + \
1086 VXDQETeacherTask.get_weightfile_xml_identifier(VXDQETeacherTask, fast_bdt_option=self.fast_bdt_option)
1087 if os.path.exists(vxd_identifier):
1088 replace_vxd_qi = True
1089 elif 'useVXD' in self.recotrack_option:
1090 raise ValueError(f"VXD QI Identifier not found: {vxd_identifier}")
1091 else:
1092 replace_vxd_qi = False
1093 elif 'noVXD' in self.recotrack_option:
1094 replace_vxd_qi = False
1095 else:
1096 vxd_identifier = self.get_input_file_names(
1097 VXDQETeacherTask.get_weightfile_xml_identifier(
1098 VXDQETeacherTask, fast_bdt_option=self.fast_bdt_option))[0]
1099 replace_vxd_qi = True
1100
1101 cdc_qe_mva_filter_parameters = None
1102 # if tracks below a certain CDC QI index shall be deleted online, this needs to be specified in the filter parameters.
1103 # this is also possible in case of the default (CBD) payloads.
1104 if 'deleteCDCQI' in self.recotrack_option:
1105 cut_index = self.recotrack_option.find('deleteCDCQI') + len('deleteCDCQI')
1106 cut = int(self.recotrack_option[cut_index:cut_index+3])/100.
1107 if replace_cdc_qi:
1108 cdc_qe_mva_filter_parameters = {
1109 "identifier": cdc_identifier, "cut": cut}
1110 else:
1111 cdc_qe_mva_filter_parameters = {
1112 "cut": cut}
1113 elif replace_cdc_qi:
1114 cdc_qe_mva_filter_parameters = {
1115 "identifier": cdc_identifier}
1116 if cdc_qe_mva_filter_parameters is not None:
1117 # if no cut is specified, the default value is at zero and nothing is deleted.
1118 basf2.set_module_parameters(
1119 path,
1120 name="TFCDC_TrackQualityEstimator",
1121 filterParameters=cdc_qe_mva_filter_parameters,
1122 deleteTracks=True,
1123 resetTakenFlag=True
1124 )
1125 if replace_vxd_qi:
1126 basf2.set_module_parameters(
1127 path,
1128 name="VXDQualityEstimatorMVA",
1129 WeightFileIdentifier=vxd_identifier)
1130
1131 # Replace final quality estimator module by training data collector module
1132 track_qe_module_name = "TrackQualityEstimatorMVA"
1133 module_found = False
1134 new_path = basf2.create_path()
1135 for module in path.modules():
1136 if module.name() != track_qe_module_name:
1137 if not module.name == 'TrackCreator':
1138 new_path.add_module(module)
1139 else:
1140 # the TrackCreator needs to be conducted before the Collector such that
1141 # MDSTTracks are related to RecoTracks and d0 and z0 can be read out
1142 new_path.add_module(
1143 'TrackCreator',
1144 pdgCodes=[
1145 211,
1146 321,
1147 2212],
1148 recoTrackColName='RecoTracks',
1149 trackColName='MDSTTracks') # , useClosestHitToIP=True, useBFieldAtHit=True)
1150 new_path.add_module(
1151 "TrackQETrainingDataCollector",
1152 TrainingDataOutputName=self.get_output_file_name(self.get_records_file_name()),
1153 collectEventFeatures=True,
1154 SVDPlusCDCStandaloneRecoTracksStoreArrayName="SVDPlusCDCStandaloneRecoTracks",
1155 )
1156 module_found = True
1157 if not module_found:
1158 raise KeyError(f"No module {track_qe_module_name} found in path")
1159 path = new_path
1160 return path
1161
1162

◆ get_input_files()

def get_input_files (   self,
  n_events = None,
  random_seed = None 
)
Get input file names depending on the use case: If they already exist, search in
the corresponding folders, for data check the specified list and if they are created
in the same run, check for the task that produced them.

Definition at line 973 of file combined_quality_estimator_teacher.py.

973 def get_input_files(self, n_events=None, random_seed=None):
974 """
975 Get input file names depending on the use case: If they already exist, search in
976 the corresponding folders, for data check the specified list and if they are created
977 in the same run, check for the task that produced them.
978 """
979 if n_events is None:
980 n_events = self.n_events
981 if random_seed is None:
982 random_seed = self.random_seed
983 if "USESIM" in random_seed:
984 if 'USESIMBB' in random_seed:
985 random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
986 elif 'USESIMEE' in random_seed:
987 random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
988 return ['datafiles/' + GenerateSimTask.output_file_name(GenerateSimTask,
989 n_events=n_events, random_seed=random_seed)]
990 elif "DATA" in random_seed:
991 return MasterTask.datafiles
992 else:
993 return self.get_input_file_names(GenerateSimTask.output_file_name(
994 GenerateSimTask, n_events=n_events, random_seed=random_seed))
995

◆ get_records_file_name()

def get_records_file_name (   self,
  n_events = None,
  random_seed = None,
  recotrack_option = None 
)

Filename of the recorded/collected data for the final QE MVA training.

Create output file name depending on number of events and production
mode that is specified in the random_seed string.

Definition at line 951 of file combined_quality_estimator_teacher.py.

951 def get_records_file_name(self, n_events=None, random_seed=None, recotrack_option=None):
952 """
953 Create output file name depending on number of events and production
954 mode that is specified in the random_seed string.
955 """
956 if n_events is None:
957 n_events = self.n_events
958 if random_seed is None:
959 random_seed = self.random_seed
960 if recotrack_option is None:
961 recotrack_option = self.recotrack_option
962 if 'rec' not in random_seed:
963 random_seed += '_rec'
964 if 'DATA' in random_seed:
965 return 'qe_records_DATA_rec.root'
966 else:
967 if 'USESIMBB' in random_seed:
968 random_seed = 'BBBAR_' + random_seed.split("_", 1)[1]
969 elif 'USESIMEE' in random_seed:
970 random_seed = 'BHABHA_' + random_seed.split("_", 1)[1]
971 return 'qe_records_N' + str(n_events) + '_' + random_seed + '_' + recotrack_option + '.root'
972

◆ output()

def output (   self)
Generate list of output files that the task should produce.
The task is considered finished if and only if the outputs all exist.

Definition at line 1031 of file combined_quality_estimator_teacher.py.

1031 def output(self):
1032 """
1033 Generate list of output files that the task should produce.
1034 The task is considered finished if and only if the outputs all exist.
1035 """
1036 yield self.add_to_output(self.get_records_file_name())
1037

◆ requires()

def requires (   self)
Generate list of luigi Tasks that this Task depends on.

Definition at line 996 of file combined_quality_estimator_teacher.py.

996 def requires(self):
997 """
998 Generate list of luigi Tasks that this Task depends on.
999 """
1000 if "USESIM" in self.random_seed or "DATA" in self.random_seed:
1001 for filename in self.get_input_files():
1002 yield CheckExistingFile(
1003 filename=filename,
1004 )
1005 else:
1006 yield SplitNMergeSimTask(
1007 bkgfiles_dir=MasterTask.bkgfiles_by_exp[self.experiment_number],
1008 random_seed=self.random_seed,
1009 n_events=self.n_events,
1010 experiment_number=self.experiment_number,
1011 )
1012 if "DATA" not in self.random_seed:
1013 if 'useCDC' not in self.recotrack_option and 'noCDC' not in self.recotrack_option:
1014 yield CDCQETeacherTask(
1015 n_events_training=MasterTask.n_events_training,
1016 experiment_number=self.experiment_number,
1017 training_target=self.cdc_training_target,
1018 process_type=self.random_seed.split("_", 1)[0],
1019 exclude_variables=MasterTask.exclude_variables_cdc,
1020 fast_bdt_option=self.fast_bdt_option,
1021 )
1022 if 'useVXD' not in self.recotrack_option and 'noVXD' not in self.recotrack_option:
1023 yield VXDQETeacherTask(
1024 n_events_training=MasterTask.n_events_training,
1025 experiment_number=self.experiment_number,
1026 process_type=self.random_seed.split("_", 1)[0],
1027 exclude_variables=MasterTask.exclude_variables_vxd,
1028 fast_bdt_option=self.fast_bdt_option,
1029 )
1030

Member Data Documentation

◆ cdc_training_target

b2luigi cdc_training_target = b2luigi.Parameter()
static

Feature/variable to use as truth label for the CDC track quality estimator.

Definition at line 932 of file combined_quality_estimator_teacher.py.

◆ experiment_number

b2luigi experiment_number = b2luigi.IntParameter()
static

Experiment number of the conditions database, e.g.

defines simulation geometry

Definition at line 927 of file combined_quality_estimator_teacher.py.

◆ fast_bdt_option

b2luigi fast_bdt_option
static
Initial value:
= b2luigi.ListParameter(
)

Hyperparameter option of the FastBDT algorithm.

default are the FastBDT default values.

Definition at line 942 of file combined_quality_estimator_teacher.py.

◆ n_events

b2luigi n_events = b2luigi.IntParameter()
static

Number of events to generate.

Definition at line 925 of file combined_quality_estimator_teacher.py.

◆ queue

str queue = 'l'
static

specify queue.

E.g. choose between 'l' (long), 's' (short) or 'sx' (short, extra ram)

Definition at line 948 of file combined_quality_estimator_teacher.py.

◆ random_seed

b2luigi random_seed = b2luigi.Parameter()
static

Random basf2 seed used by the GenerateSimTask.

It is further used to read of the production process to preserve clearness in the b2luigi output.

Definition at line 930 of file combined_quality_estimator_teacher.py.

◆ recotrack_option

b2luigi recotrack_option
static
Initial value:
= b2luigi.Parameter(
)

RecoTrack option, use string that is additive: deleteCDCQI0XY (= deletes CDCTracks with CDC-QI below 0.XY), useCDC (= uses trained CDC stored in datafiles/), useVXD (uses trained VXD stored in datafiles/), noVXD (= doesn't use the VXD MVA at all)

Definition at line 936 of file combined_quality_estimator_teacher.py.


The documentation for this class was generated from the following file: