Belle II Software development
Dataset Class Referenceabstract

Abstract base class of all Datasets given to the MVA interface The current event can always be accessed via the public members of this class. More...

#include <Dataset.h>

Inheritance diagram for Dataset:
CombinedDataset MultiDataset ROOTDataset RegressionDataSet ReweightingDataset SPlotDataset SidebandDataset SingleDataset SubDataset

Public Member Functions

 Dataset (const GeneralOptions &general_options)
 Constructs a new dataset given the general options.
 
virtual ~Dataset ()=default
 Virtual default destructor.
 
 Dataset (const Dataset &)=delete
 Specify no copy constructor.
 
Datasetoperator= (const Dataset &)=delete
 Specify no assignment operator.
 
virtual unsigned int getNumberOfFeatures () const =0
 Returns the number of features in this dataset.
 
virtual unsigned int getNumberOfSpectators () const =0
 Returns the number of spectators in this dataset.
 
virtual unsigned int getNumberOfEvents () const =0
 Returns the number of events in this dataset.
 
virtual void loadEvent (unsigned int iEvent)=0
 Load the event number iEvent.
 
virtual float getSignalFraction ()
 Returns the signal fraction of the whole sample.
 
virtual unsigned int getFeatureIndex (const std::string &feature)
 Return index of feature with the given name.
 
virtual unsigned int getSpectatorIndex (const std::string &spectator)
 Return index of spectator with the given name.
 
virtual std::vector< float > getFeature (unsigned int iFeature)
 Returns all values of one feature in a std::vector<float>
 
virtual std::vector< float > getSpectator (unsigned int iSpectator)
 Returns all values of one spectator in a std::vector<float>
 
virtual std::vector< float > getWeights ()
 Returns all weights.
 
virtual std::vector< float > getTargets ()
 Returns all targets.
 
virtual std::vector< bool > getSignals ()
 Returns all is Signals.
 

Public Attributes

GeneralOptions m_general_options
 GeneralOptions passed to this dataset.
 
std::vector< float > m_input
 Contains all feature values of the currently loaded event.
 
std::vector< float > m_spectators
 Contains all spectators values of the currently loaded event.
 
float m_weight
 Contains the weight of the currently loaded event.
 
float m_target
 Contains the target value of the currently loaded event.
 
bool m_isSignal
 Defines if the currently loaded event is signal or background.
 

Detailed Description

Abstract base class of all Datasets given to the MVA interface The current event can always be accessed via the public members of this class.

Definition at line 33 of file Dataset.h.

Constructor & Destructor Documentation

◆ Dataset()

Dataset ( const GeneralOptions general_options)
explicit

Constructs a new dataset given the general options.

Parameters
general_optionswhich defines e.g. number of variables

Definition at line 26 of file Dataset.cc.

26 : m_general_options(general_options)
27 {
28 m_input.resize(m_general_options.m_variables.size(), 0);
30 m_target = 0.0;
31 m_weight = 1.0;
32 m_isSignal = false;
33 }
std::vector< float > m_spectators
Contains all spectators values of the currently loaded event.
Definition: Dataset.h:124
GeneralOptions m_general_options
GeneralOptions passed to this dataset.
Definition: Dataset.h:122
std::vector< float > m_input
Contains all feature values of the currently loaded event.
Definition: Dataset.h:123
bool m_isSignal
Defines if the currently loaded event is signal or background.
Definition: Dataset.h:127
float m_weight
Contains the weight of the currently loaded event.
Definition: Dataset.h:125
float m_target
Contains the target value of the currently loaded event.
Definition: Dataset.h:126
std::vector< std::string > m_variables
Vector of all variables (branch names) used in the training.
Definition: Options.h:86
std::vector< std::string > m_spectators
Vector of all spectators (branch names) used in the training.
Definition: Options.h:87

Member Function Documentation

◆ getFeature()

std::vector< float > getFeature ( unsigned int  iFeature)
virtual

Returns all values of one feature in a std::vector<float>

Parameters
iFeaturethe position of the feature to return

Reimplemented in SingleDataset, SubDataset, CombinedDataset, ROOTDataset, RegressionDataSet, ReweightingDataset, and SidebandDataset.

Definition at line 74 of file Dataset.cc.

75 {
76
77 std::vector<float> result(getNumberOfEvents());
78 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
79 loadEvent(iEvent);
80 result[iEvent] = m_input[iFeature];
81 }
82 return result;
83
84 }
virtual unsigned int getNumberOfEvents() const =0
Returns the number of events in this dataset.
virtual void loadEvent(unsigned int iEvent)=0
Load the event number iEvent.

◆ getFeatureIndex()

unsigned int getFeatureIndex ( const std::string &  feature)
virtual

Return index of feature with the given name.

Parameters
featurename of the feature

Definition at line 50 of file Dataset.cc.

51 {
52
53 auto it = std::find(m_general_options.m_variables.begin(), m_general_options.m_variables.end(), feature);
54 if (it == m_general_options.m_variables.end()) {
55 B2ERROR("Unknown feature named " << feature);
56 return 0;
57 }
58 return std::distance(m_general_options.m_variables.begin(), it);
59
60 }

◆ getNumberOfEvents()

virtual unsigned int getNumberOfEvents ( ) const
pure virtual

Returns the number of events in this dataset.

Implemented in SingleDataset, MultiDataset, SubDataset, CombinedDataset, ROOTDataset, RegressionDataSet, ReweightingDataset, SidebandDataset, and SPlotDataset.

◆ getNumberOfFeatures()

virtual unsigned int getNumberOfFeatures ( ) const
pure virtual

Returns the number of features in this dataset.

Implemented in SingleDataset, MultiDataset, SubDataset, CombinedDataset, ROOTDataset, RegressionDataSet, ReweightingDataset, SidebandDataset, and SPlotDataset.

◆ getNumberOfSpectators()

virtual unsigned int getNumberOfSpectators ( ) const
pure virtual

Returns the number of spectators in this dataset.

Implemented in SingleDataset, MultiDataset, SubDataset, CombinedDataset, ROOTDataset, RegressionDataSet, ReweightingDataset, SidebandDataset, and SPlotDataset.

◆ getSignalFraction()

float getSignalFraction ( )
virtual

Returns the signal fraction of the whole sample.

Reimplemented in SPlotDataset.

Definition at line 35 of file Dataset.cc.

36 {
37
38 double signal_weight_sum = 0;
39 double weight_sum = 0;
40 for (unsigned int i = 0; i < getNumberOfEvents(); ++i) {
41 loadEvent(i);
42 weight_sum += m_weight;
43 if (m_isSignal)
44 signal_weight_sum += m_weight;
45 }
46 return signal_weight_sum / weight_sum;
47
48 }

◆ getSignals()

std::vector< bool > getSignals ( )
virtual

Returns all is Signals.

Reimplemented in ReweightingDataset.

Definition at line 122 of file Dataset.cc.

123 {
124
125 std::vector<bool> result(getNumberOfEvents());
126 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
127 loadEvent(iEvent);
128 result[iEvent] = m_isSignal;
129 }
130 return result;
131
132 }

◆ getSpectator()

std::vector< float > getSpectator ( unsigned int  iSpectator)
virtual

Returns all values of one spectator in a std::vector<float>

Parameters
iSpectatorthe position of the feature to return

Reimplemented in SingleDataset, SubDataset, CombinedDataset, ROOTDataset, RegressionDataSet, ReweightingDataset, and SidebandDataset.

Definition at line 86 of file Dataset.cc.

87 {
88
89 std::vector<float> result(getNumberOfEvents());
90 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
91 loadEvent(iEvent);
92 result[iEvent] = m_spectators[iSpectator];
93 }
94 return result;
95
96 }

◆ getSpectatorIndex()

unsigned int getSpectatorIndex ( const std::string &  spectator)
virtual

Return index of spectator with the given name.

Parameters
spectatorname of the spectator

Definition at line 62 of file Dataset.cc.

63 {
64
65 auto it = std::find(m_general_options.m_spectators.begin(), m_general_options.m_spectators.end(), spectator);
66 if (it == m_general_options.m_spectators.end()) {
67 B2ERROR("Unknown spectator named " << spectator);
68 return 0;
69 }
70 return std::distance(m_general_options.m_spectators.begin(), it);
71
72 }

◆ getTargets()

std::vector< float > getTargets ( )
virtual

Returns all targets.

Reimplemented in RegressionDataSet, and ReweightingDataset.

Definition at line 110 of file Dataset.cc.

111 {
112
113 std::vector<float> result(getNumberOfEvents());
114 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
115 loadEvent(iEvent);
116 result[iEvent] = m_target;
117 }
118 return result;
119
120 }

◆ getWeights()

std::vector< float > getWeights ( )
virtual

Returns all weights.

Reimplemented in ROOTDataset, RegressionDataSet, and ReweightingDataset.

Definition at line 98 of file Dataset.cc.

99 {
100
101 std::vector<float> result(getNumberOfEvents());
102 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
103 loadEvent(iEvent);
104 result[iEvent] = m_weight;
105 }
106 return result;
107
108 }

◆ loadEvent()

virtual void loadEvent ( unsigned int  iEvent)
pure virtual

Load the event number iEvent.

Parameters
iEventevent number to load

Implemented in ROOTDataset, ReweightingDataset, SidebandDataset, SPlotDataset, MultiDataset, SubDataset, CombinedDataset, RegressionDataSet, and SingleDataset.

Member Data Documentation

◆ m_general_options

GeneralOptions m_general_options

GeneralOptions passed to this dataset.

Definition at line 122 of file Dataset.h.

◆ m_input

std::vector<float> m_input

Contains all feature values of the currently loaded event.

Definition at line 123 of file Dataset.h.

◆ m_isSignal

bool m_isSignal

Defines if the currently loaded event is signal or background.

Definition at line 127 of file Dataset.h.

◆ m_spectators

std::vector<float> m_spectators

Contains all spectators values of the currently loaded event.

Definition at line 124 of file Dataset.h.

◆ m_target

float m_target

Contains the target value of the currently loaded event.

Definition at line 126 of file Dataset.h.

◆ m_weight

float m_weight

Contains the weight of the currently loaded event.

Definition at line 125 of file Dataset.h.


The documentation for this class was generated from the following files: