Belle II Software development
TrainDataSaver Class Reference
Inheritance diagram for TrainDataSaver:

Public Member Functions

def __init__ (self, output_file, flag_file)
 
def initialize (self)
 
def event (self)
 
def terminate (self)
 

Public Attributes

 output_file
 Filename to save training data to.
 
 flag_list
 Filename of the flag file indicating passing events.
 
 fast_mode
 Whether use fast mode or advanced mode.
 
 eventInfo
 Initialise event metadata from data store.
 
 eventExtraInfo
 Initialise event extra info from data store.
 
 df_dict
 Pandas dataframe to save particle features.
 

Detailed Description

Save MCParticles to Pandas Dataframe.

Arguments:
    output_file (str): Filename to save training data.
        Ending with ``parquet`` indicating fast mode, which will generate the final parquet file for training.
        Ending with ``h5`` indicating advanced mode, which will produce a temperary h5 file for further preprocessing.
    flag_file (str): Filename of the flag file indicating passing events.

Returns:
    None

Definition at line 68 of file NN_trainer_module.py.

Constructor & Destructor Documentation

◆ __init__()

def __init__ (   self,
  output_file,
  flag_file 
)
Initialize the TrainDataSaver module.

:param output_file: Filename to save training data to.
:param flag_file: Filename of the flag file indicating passing events.

Definition at line 82 of file NN_trainer_module.py.

86 ):
87 """
88 Initialize the TrainDataSaver module.
89
90 :param output_file: Filename to save training data to.
91 :param flag_file: Filename of the flag file indicating passing events.
92 """
93 super().__init__()
94
95 self.output_file = output_file
96
97 self.flag_list = ak.from_parquet(flag_file)
98
99 self.fast_mode = output_file.endswith(".parquet")
100
101 # delete output file if it already exists, since we will apend later
102 if os.path.exists(output_file):
103 os.remove(output_file)
104

Member Function Documentation

◆ event()

def event (   self)
Process each event and append event information to the dictionary.

Definition at line 116 of file NN_trainer_module.py.

116 def event(self):
117 """
118 Process each event and append event information to the dictionary.
119 """
120 evtNum = self.eventInfo.getEvent()
121 self.df_dict = pd.concat([
122 self.df_dict,
123 load_particle_list(mcplist=Belle2.PyStoreArray("MCParticles"), evtNum=evtNum, label=(evtNum in self.flag_list))
124 ])
125
A (simplified) python wrapper for StoreArray.
Definition: PyStoreArray.h:72

◆ initialize()

def initialize (   self)
Initialize the data store and the dictionary to save particle features before processing events.

Definition at line 105 of file NN_trainer_module.py.

105 def initialize(self):
106 """
107 Initialize the data store and the dictionary to save particle features before processing events.
108 """
109
110 self.eventInfo = Belle2.PyStoreObj('EventMetaData')
111
112 self.eventExtraInfo = Belle2.PyStoreObj('EventExtraInfo')
113
114 self.df_dict = pd.DataFrame()
115
a (simplified) python wrapper for StoreObjPtr.
Definition: PyStoreObj.h:67

◆ terminate()

def terminate (   self)
Append events on disk in either of the two different ways and free memory.

In fast mode, the dataframe containing particle-level information and skim labels is preprocessed
and saved as a parquet file which is ready for NN training.

In advanced mode, the dataframe is saved as a h5 file and waits for combination with event-level information
before preprocessing.

Definition at line 126 of file NN_trainer_module.py.

126 def terminate(self):
127 """
128 Append events on disk in either of the two different ways and free memory.
129
130 In fast mode, the dataframe containing particle-level information and skim labels is preprocessed
131 and saved as a parquet file which is ready for NN training.
132
133 In advanced mode, the dataframe is saved as a h5 file and waits for combination with event-level information
134 before preprocessing.
135 """
136 if self.fast_mode:
137 ak.to_parquet(preprocessed(self.df_dict), self.output_file)
138 else:
139 self.df_dict.to_hdf(self.output_file, key='mc_information', mode='a', format='table', append=True)
140 self.df_dict = pd.DataFrame()
141
142

Member Data Documentation

◆ df_dict

df_dict

Pandas dataframe to save particle features.

Definition at line 114 of file NN_trainer_module.py.

◆ eventExtraInfo

eventExtraInfo

Initialise event extra info from data store.

Definition at line 112 of file NN_trainer_module.py.

◆ eventInfo

eventInfo

Initialise event metadata from data store.

Definition at line 110 of file NN_trainer_module.py.

◆ fast_mode

fast_mode

Whether use fast mode or advanced mode.

Definition at line 99 of file NN_trainer_module.py.

◆ flag_list

flag_list

Filename of the flag file indicating passing events.

Definition at line 97 of file NN_trainer_module.py.

◆ output_file

output_file

Filename to save training data to.

Definition at line 95 of file NN_trainer_module.py.


The documentation for this class was generated from the following file: