Public Member Functions
	__init__ (self, run_class, use_jupyter=True)

	train (self)

	evaluate_tracking (self)

	evaluate_classification (self)

Public Attributes
	run_class = run_class
	cached copy of the run class

	use_jupyter = use_jupyter
	cached flag to use jupyter notebook

str	recording_file_name = self.run_class.recording_module + ".root"
	cached name of the output file

	file_name_path
	cached path without extension of the output file

str	training_file_name = self.file_name_path + "Training" + ext
	cached path with extension of the training-output file

str	test_file_name = self.file_name_path + "Testing" + ext
	cached path with extension of the testing-output file

str	identifier_name = "FastBDT.weights.xml"
	cached identifier

str	evaluation_file_name = self.identifier_name + ".pdf"
	cached name of the output PDF file

str	expert_file_name = self.file_name_path + "TestingExport" + ext
	cached path with extension of the testing-export file

	weight_data_location
	cached path of the weight input data

	recording_module = 999:

})	recording_parameter

Protected Member Functions
	_call_training_routine (self)

	_write_train_and_test_files (self)

	_create_records_file (self)

	_call_expert_routine (self)

	_call_evaluation_routine (self)

Detailed Description

Class for training and analysing a tracking module, which has a MVA filter in it.

Works best, if you are on a jupyter ntoebook.

You need to supply a run_class, which includes all needed settings, on how to
train and execute the module. This class will be mixed in with the normal trackfindingcdc
run classes, so you can add the setting (e.g. tracking_coverage etc.) as normal.

One examples is:

class TestClass:
    # This module will be trained
    recording_module = "FilterBasedVXDCDCTrackMerger"
    # This is the name of the parameter of this module, which will be set to "mva" etc.
    recording_parameter = "filter"

    # These mva cuts will be tested during evaluation.
    evaluation_cuts = [0.1, 0.2, ...]

    tracking_coverage = {
        'UsePXDHits': True,
        'UseSVDHits': True,
        'UseCDCHits': True,
    }

    # Some options, which will control the run classes
    fit_tracks = False
    generator_module = "EvtGenInput"

    # This will be added to the "normal" path, to record the training data (you do not have to set the module to
    # recording, as this is done automatically).
    def add_recording_modules(self, path):
        mctrackfinder = path.add_module('TrackFinderMCTruthRecoTracks',
                                RecoTracksStoreArrayName='MCRecoTracks',
                                WhichParticles=[])

        path.add_module('MCRecoTracksMatcher', mcRecoTracksStoreArrayName="MCRecoTracks",
                        prRecoTracksStoreArrayName="CDCRecoTracks", UseCDCHits=True, UsePXDHits=False, UseSVDHits=False)
        path.add_module('MCRecoTracksMatcher', mcRecoTracksStoreArrayName="MCRecoTracks",
                        prRecoTracksStoreArrayName="VXDRecoTracks", UseCDCHits=False, UsePXDHits=True, UseSVDHits=True)

        # Merge CDC and CXD tracks
        path.add_module('FilterBasedVXDCDCTrackMerger',
                        extrapolate=False,
                        CDCRecoTrackStoreArrayName="CDCRecoTracks",
                        VXDRecoTrackStoreArrayName="VXDRecoTracks",
                        MergedRecoTrackStoreArrayName="RecoTracks")

        return path

    # This will be added to the "normal" path, to evaluate the mva cuts. In most cases, this is the same as the
    # add_recording_modules (as the module parameters will be set automatically), but maybe you need
    # more here...
    def add_validation_modules(self, path):
        mctrackfinder = path.add_module('TrackFinderMCTruthRecoTracks',
                                RecoTracksStoreArrayName='MCRecoTracks',
                                WhichParticles=[])

        # Merge CDC and CXD tracks
        path.add_module('FilterBasedVXDCDCTrackMerger',
                        extrapolate=True,
                        CDCRecoTrackStoreArrayName="CDCRecoTracks",
                        VXDRecoTrackStoreArrayName="VXDRecoTracks",
                        MergedRecoTrackStoreArrayName="PrefitRecoTracks")

        path.add_module("SetupGenfitExtrapolation")

        path.add_module("DAFRecoFitter", recoTracksStoreArrayName="PrefitRecoTracks")

        path.add_module("TrackCreator", recoTrackColName="PrefitRecoTracks")

        path.add_module("FittedTracksStorer", inputRecoTracksStoreArrayName="PrefitRecoTracks",
                        outputRecoTracksStoreArrayName="RecoTracks")

        # We need to include the matching ourselves, as we have already a matching algorithm in place
        path.add_module('MCRecoTracksMatcher', mcRecoTracksStoreArrayName="MCRecoTracks",
                        prRecoTracksStoreArrayName="RecoTracks", UseCDCHits=True, UsePXDHits=True, UseSVDHits=True)

        return path

Definition at line 49 of file analyse.py.

Constructor & Destructor Documentation

◆ init()

__init__	(	self,
		run_class,
		use_jupyter = True )


Constructor

Definition at line 132 of file analyse.py.

    def __init__(self, run_class, use_jupyter=True):
        """Constructor"""
 
        ## cached copy of the run class
        self.run_class = run_class
        ## cached flag to use jupyter notebook
        self.use_jupyter = use_jupyter
 
        ## cached name of the output file
        self.recording_file_name = self.run_class.recording_module + ".root"
 
        ## cached path without extension of the output file
        self.file_name_path, ext = os.path.splitext(self.recording_file_name)
 
        ## cached path with extension of the training-output file
        self.training_file_name = self.file_name_path + "Training" + ext
        ## cached path with extension of the testing-output file
        self.test_file_name = self.file_name_path + "Testing" + ext
 
        ## cached identifier
        self.identifier_name = "FastBDT.weights.xml"
        ## cached name of the output PDF file
        self.evaluation_file_name = self.identifier_name + ".pdf"
 
        ## cached path with extension of the testing-export file
        self.expert_file_name = self.file_name_path + "TestingExport" + ext
 
        ## cached path of the weight input data
        self.weight_data_location = Belle2.FileSystem.findFile(os.path.join("tracking/data",
                                                                            self.run_class.weight_data_location))
 

Member Function Documentation

◆ _call_evaluation_routine()

_call_evaluation_routine ( self )

protected


Call the mva evaluation routine

Definition at line 301 of file analyse.py.

    def _call_evaluation_routine(self):
        """Call the mva evaluation routine"""
        try:
            check_output(["basf2_mva_evaluate.py",
                          "--identifiers", self.identifier_name, self.weight_data_location,
                          "--train_datafiles", self.training_file_name,
                          "--datafiles", self.test_file_name,
                          "--treename", "records",
                          "--outputfile", self.evaluation_file_name],
                         stderr=STDOUT)
        except CalledProcessError as e:
            raise RuntimeError(e.output)

◆ _call_expert_routine()

_call_expert_routine ( self )

protected


Call the mva expert

Definition at line 290 of file analyse.py.

    def _call_expert_routine(self):
        """Call the mva expert"""
        try:
            check_output(["basf2_mva_expert",
                          "--identifiers", self.identifier_name, self.weight_data_location,
                          "--datafiles", self.test_file_name,
                          "--outputfile", self.expert_file_name,
                          "--treename", "records"])
        except CalledProcessError as e:
            raise RuntimeError(e.output)
 

◆ _call_training_routine()

_call_training_routine ( self )

protected


Call the mva training routine in the train file

Definition at line 238 of file analyse.py.

    def _call_training_routine(self):
        """Call the mva training routine in the train file"""
        try:
            check_output(["trackfindingcdc_teacher", self.training_file_name])
        except CalledProcessError as e:
            raise RuntimeError(e.output)
 

◆ _create_records_file()

_create_records_file ( self )

protected

Create a path using the settings of the run_class and process it.
This will create a ROOT file with the recorded data.

Definition at line 258 of file analyse.py.

    def _create_records_file(self):
        """
        Create a path using the settings of the run_class and process it.
        This will create a ROOT file with the recorded data.
        """
        recording_file_name = self.recording_file_name
 
        class RecordRun(self.run_class, ReadOrGenerateEventsRun):
 
            def create_path(self):
                path = ReadOrGenerateEventsRun.create_path(self)
 
                self.add_recording_modules(path)
 
                adjust_module(path, self.recording_module,
                              **{self.recording_parameter + "Parameters": {"rootFileName": recording_file_name},
                                 self.recording_parameter: "recording"})
 
                return path
 
        run = RecordRun()
        path = run.create_path()
 
        if self.use_jupyter:
            calculation = handler.process(path)
            calculation.start()
            calculation.wait_for_end()
 
            return calculation
        else:
            run.execute()
 

◆ _write_train_and_test_files()

_write_train_and_test_files ( self )

protected


Split the recorded file into two halves: training and test file and write it back

Definition at line 245 of file analyse.py.

    def _write_train_and_test_files(self):
        """Split the recorded file into two halves: training and test file and write it back"""
        # TODO: This seems to reorder the columns...
        df = uproot.concatenate(self.recording_file_name, library='pd')
        mask = np.random.rand(len(df)) < 0.5
        training_sample = df[mask]
        test_sample = df[~mask]
 
        with uproot.recreate(self.training_file_name) as outfile:
            outfile["records"] = training_sample
        with uproot.recreate(self.test_file_name) as outfile:
            outfile["records"] = test_sample
 

◆ evaluate_classification()

evaluate_classification ( self )

Evaluate the classification power on the test data set and produce a PDF.

Definition at line 215 of file analyse.py.

    def evaluate_classification(self):
        """
        Evaluate the classification power on the test data set and produce a PDF.
        """
        if not os.path.exists(self.expert_file_name) or not os.path.exists(self.evaluation_file_name):
            self._call_evaluation_routine()
            self._call_expert_routine()
 
        df = uproot.concatenate(
            self.expert_file_name,
            library='pd').merge(
            uproot.concatenate(
                self.test_file_name,
                library='pd'),
            left_index=True,
            right_index=True)
 
        if self.use_jupyter:
            from IPython.display import display
            display(PDF(self.evaluation_file_name, size=(800, 800)))
 
        return df
 

◆ evaluate_tracking()

evaluate_tracking ( self )

Use the trained weight file and call the path again using different mva cuts. Validation using the
normal tracking validation modules.

Definition at line 173 of file analyse.py.

    def evaluate_tracking(self):
        """
        Use the trained weight file and call the path again using different mva cuts. Validation using the
        normal tracking validation modules.
        """
        copy(self.identifier_name, self.weight_data_location)
 
        try:
            os.mkdir("results")
        except FileExistsError:
            pass
 
        def create_path(mva_cut):
            class ValidationRun(self.run_class, TrackingValidationRun):
 
                def finder_module(self, path):
                    self.add_validation_modules(path)
 
                    if mva_cut != 999:
                        adjust_module(path, self.recording_module,
                                      **{self.recording_parameter + "Parameters": {"cut": mva_cut},
                                         self.recording_parameter: "mva"})
                    else:
                        adjust_module(path, self.recording_module, **{self.recording_parameter: "truth"})
 
                output_file_name = f"results/validation_{mva_cut}.root"
 
            run = ValidationRun()
 
            if not os.path.exists(run.output_file_name):
                return {"path": run.create_path()}
            else:
                return {"path": None}
 
        assert self.use_jupyter
 
        calculations = handler.process_parameter_space(create_path, mva_cut=self.run_class.evaluation_cuts + [999])
        calculations.start()
        calculations.wait_for_end()
 
        return calculations
 

◆ train()

train ( self )


Record a training file, split it in two parts and call the training method of the mva package

Definition at line 163 of file analyse.py.

    def train(self):
        """Record a training file, split it in two parts and call the training method of the mva package"""
        if not os.path.exists(self.recording_file_name):
            self._create_records_file()
 
        if not os.path.exists(self.training_file_name) or not os.path.exists(self.test_file_name):
            self._write_train_and_test_files()
 
        self._call_training_routine()
 

Member Data Documentation

◆ evaluation_file_name

evaluation_file_name = self.identifier_name + ".pdf"

cached name of the output PDF file

Definition at line 154 of file analyse.py.

◆ expert_file_name

expert_file_name = self.file_name_path + "TestingExport" + ext

cached path with extension of the testing-export file

Definition at line 157 of file analyse.py.

◆ file_name_path

file_name_path

cached path without extension of the output file

Definition at line 144 of file analyse.py.

◆ identifier_name

identifier_name = "FastBDT.weights.xml"

cached identifier

Definition at line 152 of file analyse.py.

◆ recording_file_name

recording_file_name = self.run_class.recording_module + ".root"

cached name of the output file

Definition at line 141 of file analyse.py.

◆ recording_module

recording_module = 999:

Definition at line 192 of file analyse.py.

◆ recording_parameter

}) recording_parameter

Definition at line 194 of file analyse.py.

◆ run_class

run_class = run_class

cached copy of the run class

Definition at line 136 of file analyse.py.

◆ test_file_name

test_file_name = self.file_name_path + "Testing" + ext

cached path with extension of the testing-output file

Definition at line 149 of file analyse.py.

◆ training_file_name

training_file_name = self.file_name_path + "Training" + ext

cached path with extension of the training-output file

Definition at line 147 of file analyse.py.

◆ use_jupyter

use_jupyter = use_jupyter

cached flag to use jupyter notebook

Definition at line 138 of file analyse.py.

◆ weight_data_location

weight_data_location

Initial value:

= Belle2.FileSystem.findFile(os.path.join("tracking/data",

self.run_class.weight_data_location))

Belle2::FileSystem::findFile

static std::string findFile(const std::string &path, bool silent=false)

Search for given file or directory in local or central release directory, and return absolute path if...

Definition FileSystem.cc:151

cached path of the weight input data

Definition at line 160 of file analyse.py.

The documentation for this class was generated from the following file:

tracking/trackFindingCDC/scripts/trackfindingcdc/mva/analyse.py

Public Member Functions

Public Attributes

Protected Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ _call_evaluation_routine()

◆ _call_expert_routine()

◆ _call_training_routine()

◆ _create_records_file()

◆ _write_train_and_test_files()

◆ evaluate_classification()

◆ evaluate_tracking()

◆ train()

Member Data Documentation

◆ evaluation_file_name

◆ expert_file_name

◆ file_name_path

◆ identifier_name

◆ recording_file_name

◆ recording_module

◆ recording_parameter

◆ run_class

◆ test_file_name

◆ training_file_name

◆ use_jupyter

◆ weight_data_location

◆ init()