Public Member Functions
	__init__ (self, in_dir, out_dir, job_id, save_vars=None)

	process (self)

	process_b2script (self, num_events=2500)

	merge_files (self)

	clean_up (self)

Public Attributes
str	data = f'{in_dir}{dataName}{job_id}.root'
	Input root file generated before skimming.

str	flag = f'{in_dir}{flagName}{job_id}.parquet'
	Filename of the flag file indicating passing events.

str	out_temp = f'{out_dir}_temp{job_id}/'
	Temporary directory to keep intermediate files for advanced mode.

dict	temp_file
	Intermediate files.

str	out_file = f'{out_dir}preprocessed{job_id}.parquet'
	Final output Parquet file.

	save_vars = save_vars
	Variables to save for different event levels.

Detailed Description

Process data for training and save to Parquet file. Two modes are provided:
Fast mode: save_vars set to None, produce the dataset with only the necessary information for the training.
Advanced mode: save_vars set to a dictionary of event-level variables,
run through hard-coded b2 steering code in self.process_b2script to produce the required particle lists
and save the required variables, can be used for event-level cuts or evaluations of the NN performance.

Arguments:
    in_dir (str): Input directory.
    out_dir (str): Output directory.
    job_id (int): Job ID for batch processing.
    save_vars (dict): Event-level variables to save for different particles.
        By default None for fast mode.
        In the example script having Y4S and B keys for the corresponding particle list.

Returns:
    None

Definition at line 143 of file NN_trainer_module.py.

Constructor & Destructor Documentation

◆ init()

__init__	(	self,
		in_dir,
		out_dir,
		job_id,
		save_vars = None )

Initialize the data_production object.

:param in_dir: Input directory.
:param out_dir: Output directory.
:param job_id: Job ID for batch processing.
:param save_vars: Event-level variables to save for different particles.
By default None for fast mode.
In the example script having Y4S and B keys for the corresponding particle list.

Definition at line 163 of file NN_trainer_module.py.

    def __init__(self, in_dir, out_dir, job_id, save_vars=None):
        """
        Initialize the data_production object.
 
        :param in_dir: Input directory.
        :param out_dir: Output directory.
        :param job_id: Job ID for batch processing.
        :param save_vars: Event-level variables to save for different particles.
        By default None for fast mode.
        In the example script having Y4S and B keys for the corresponding particle list.
        """
        dataName = '_submdst'
        flagName = '_flag'
        
        self.data = f'{in_dir}{dataName}{job_id}.root'
        
        self.flag = f'{in_dir}{flagName}{job_id}.parquet'
        if save_vars is not None:
            
            self.out_temp = f'{out_dir}_temp{job_id}/'
            os.makedirs(out_dir, exist_ok=True)
            os.makedirs(self.out_temp, exist_ok=True)
            
            self.temp_file = {
                'MC': f'{self.out_temp}mc.h5',
                'Y4S': f'{self.out_temp}y4s.h5',
                'B': f'{self.out_temp}b.h5'
                }
        
        self.out_file = f'{out_dir}preprocessed{job_id}.parquet'
        
        self.save_vars = save_vars
 

Member Function Documentation

◆ clean_up()

clean_up ( self )

Clean up temporary files.

Definition at line 268 of file NN_trainer_module.py.

    def clean_up(self):
        """
        Clean up temporary files.
        """
        # uncomment if needed for batch job
        # os.remove(self.data)
        os.remove(self.flag)
        if self.save_vars is not None:
            shutil.rmtree(self.out_temp)

◆ merge_files()

merge_files ( self )

Merge file of particle-level information (MC) with those of event-level information (Y4S, B).
Preprocess and save to disk as Parquet file in form of Awkward Array.

Definition at line 256 of file NN_trainer_module.py.

    def merge_files(self):
        """
        Merge file of particle-level information (MC) with those of event-level information (Y4S, B).
        Preprocess and save to disk as Parquet file in form of Awkward Array.
        """
        df = pd.read_hdf(self.temp_file['MC'], key='mc_information')
        df_y4s = pd.read_hdf(self.temp_file['Y4S'], key='Upsilon(4S):mc')
        df_b = pd.read_hdf(self.temp_file['B'], key='B0:generic')
        df_merged = df_y4s.merge(df_b.drop(axis=1, labels=['icand', 'ncand']), how="left")
        decorr_df = df_merged.rename({'evt': 'evtNum'}, axis=1)
        ak.to_parquet(preprocessed(df, decorr_df), self.out_file)
 

◆ process()

process ( self )

Process the b2 steering file and the data generation.

Definition at line 196 of file NN_trainer_module.py.

    def process(self):
        """
        Process the b2 steering file and the data generation.
        """
        self.process_b2script()
        if self.save_vars is not None:
            self.merge_files()
 

◆ process_b2script()

process_b2script	(		self,
			num_events = 2500 )

Skimming process with TrainDataSaver module.

:param num_events: Maximum number of events to process.

Definition at line 204 of file NN_trainer_module.py.

    def process_b2script(self, num_events=2500):
        """
        Skimming process with TrainDataSaver module.
 
        :param num_events: Maximum number of events to process.
        """
        path = ma.create_path()
 
        ma.inputMdst(environmentType='default', filename=self.data, path=path)
        ma.buildEventShape(path=path)
        ma.buildEventKinematics(path=path)
 
        # process with advance mode
        if self.save_vars is not None:
            TrainDataSaver_module = TrainDataSaver(
                output_file=self.temp_file['MC'],
                flag_file=self.flag,
            )
            path.add_module(TrainDataSaver_module)
            ma.fillParticleListFromMC('Upsilon(4S):mc', '', path=path)
            v2hdf5_y4s = VariablesToHDF5(
                'Upsilon(4S):mc',
                self.save_vars['Y4S'],
                filename=self.temp_file['Y4S'],
            )
            path.add_module(v2hdf5_y4s)
 
            fei_skim = feiHadronicB0(udstOutput=False, analysisGlobaltag=ma.getAnalysisGlobaltag())
            fei_skim(path=path)
            fei_skim.postskim_path.add_module(
                    "BestCandidateSelection",
                    particleList="B0:generic",
                    variable="extraInfo(SignalProbability)",
                    outputVariable="rank_signalprob",
                    numBest=1,
                )
            # Key of saved table is the name of particle list
            v2hdf5_b = VariablesToHDF5(
                'B0:generic',
                self.save_vars['B'],
                filename=self.temp_file['B'],
            )
            fei_skim.postskim_path.add_module(v2hdf5_b)
        # process with fast mode
        else:
            TrainDataSaver_module = TrainDataSaver(
                output_file=self.out_file,
                flag_file=self.flag,
            )
            path.add_module(TrainDataSaver_module)
        b2.process(path, max_event=num_events)
 

Member Data Documentation

◆ data

str data = f'{in_dir}{dataName}{job_id}.root'

Input root file generated before skimming.

Definition at line 177 of file NN_trainer_module.py.

◆ flag

flag = f'{in_dir}{flagName}{job_id}.parquet'

Filename of the flag file indicating passing events.

Definition at line 179 of file NN_trainer_module.py.

◆ out_file

out_file = f'{out_dir}preprocessed{job_id}.parquet'

Final output Parquet file.

Definition at line 192 of file NN_trainer_module.py.

◆ out_temp

out_temp = f'{out_dir}_temp{job_id}/'

Temporary directory to keep intermediate files for advanced mode.

Definition at line 182 of file NN_trainer_module.py.

◆ save_vars

save_vars = save_vars

Variables to save for different event levels.

Definition at line 194 of file NN_trainer_module.py.

◆ temp_file

dict temp_file

Initial value:

=  {
                'MC': f'{self.out_temp}mc.h5',
                'Y4S': f'{self.out_temp}y4s.h5',
                'B': f'{self.out_temp}b.h5'
                }

Intermediate files.

Definition at line 186 of file NN_trainer_module.py.

The documentation for this class was generated from the following file:

generators/scripts/smartBKG/b2modules/NN_trainer_module.py

Public Member Functions

Public Attributes