Belle II Software development
SubDataset Class Reference

Wraps another Dataset and provides a view to a subset of its features and events. More...

#include <Dataset.h>

Inheritance diagram for SubDataset:
Dataset

Public Member Functions

 SubDataset (const GeneralOptions &general_options, const std::vector< bool > &events, Dataset &dataset)
 Constructs a new SubDataset holding a reference to the wrapped Dataset.
 
virtual unsigned int getNumberOfFeatures () const override
 Returns the number of features in this dataset, so the size of the given subset of the variables.
 
virtual unsigned int getNumberOfSpectators () const override
 Returns the number of spectators in this dataset, so the size of the given subset of the spectators.
 
virtual unsigned int getNumberOfEvents () const override
 Returns the number of events in the wrapped dataset.
 
virtual void loadEvent (unsigned int iEvent) override
 Load the event number iEvent from the wrapped dataset.
 
virtual std::vector< float > getFeature (unsigned int iFeature) override
 Returns all values of one feature in a std::vector<float> of the wrapped dataset.
 
virtual std::vector< float > getSpectator (unsigned int iSpectator) override
 Returns all values of one spectator in a std::vector<float> of the wrapped dataset.
 
virtual float getSignalFraction ()
 Returns the signal fraction of the whole sample.
 
virtual unsigned int getFeatureIndex (const std::string &feature)
 Return index of feature with the given name.
 
virtual unsigned int getSpectatorIndex (const std::string &spectator)
 Return index of spectator with the given name.
 
virtual std::vector< float > getWeights ()
 Returns all weights.
 
virtual std::vector< float > getTargets ()
 Returns all targets.
 
virtual std::vector< bool > getSignals ()
 Returns all is Signals.
 

Public Attributes

GeneralOptions m_general_options
 GeneralOptions passed to this dataset.
 
std::vector< float > m_input
 Contains all feature values of the currently loaded event.
 
std::vector< float > m_spectators
 Contains all spectators values of the currently loaded event.
 
float m_weight
 Contains the weight of the currently loaded event.
 
float m_target
 Contains the target value of the currently loaded event.
 
bool m_isSignal
 Defines if the currently loaded event is signal or background.
 

Private Attributes

bool m_use_event_indices = false
 Use only a subset of the wrapped dataset events.
 
std::vector< unsigned int > m_feature_indices
 Mapping from the position of a feature in the given subset to its position in the wrapped dataset.
 
std::vector< unsigned int > m_spectator_indices
 Mapping from the position of a spectator in the given subset to its position in the wrapped dataset.
 
std::vector< unsigned int > m_event_indices
 Mapping from the position of a event in the given subset to its position in the wrapped dataset.
 
Datasetm_dataset
 Reference to the wrapped dataset.
 

Detailed Description

Wraps another Dataset and provides a view to a subset of its features and events.

Used by the Combination method which can combine multiple methods with possibly different variables

Definition at line 234 of file Dataset.h.

Constructor & Destructor Documentation

◆ SubDataset()

SubDataset ( const GeneralOptions general_options,
const std::vector< bool > &  events,
Dataset dataset 
)

Constructs a new SubDataset holding a reference to the wrapped Dataset.

Parameters
general_optionswhich defines e.g. a subset of variables of the original dataset
eventssubset of events which are provided by this Dataset
datasetreference to the wrapped Dataset

Definition at line 186 of file Dataset.cc.

187 : Dataset(general_options), m_dataset(dataset)
188 {
189
190 for (auto& v : m_general_options.m_variables) {
191 auto it = std::find(m_dataset.m_general_options.m_variables.begin(), m_dataset.m_general_options.m_variables.end(), v);
192 if (it == m_dataset.m_general_options.m_variables.end()) {
193 B2ERROR("Couldn't find variable " << v << " in GeneralOptions");
194 throw std::runtime_error("Couldn't find variable " + v + " in GeneralOptions");
195 }
197 }
198
199 for (auto& v : m_general_options.m_spectators) {
201 if (it == m_dataset.m_general_options.m_spectators.end()) {
202 B2ERROR("Couldn't find spectator " << v << " in GeneralOptions");
203 throw std::runtime_error("Couldn't find spectator " + v + " in GeneralOptions");
204 }
206 }
207
208 if (events.size() > 0)
209 m_use_event_indices = true;
210
212 m_event_indices.resize(dataset.getNumberOfEvents());
213 unsigned int n_events = 0;
214 for (unsigned int iEvent = 0; iEvent < dataset.getNumberOfEvents(); ++iEvent) {
215 if (events.size() == 0 or events[iEvent]) {
216 m_event_indices[n_events] = iEvent;
217 n_events++;
218 }
219 }
220 m_event_indices.resize(n_events);
221 }
222
223 }
GeneralOptions m_general_options
GeneralOptions passed to this dataset.
Definition: Dataset.h:122
Dataset(const GeneralOptions &general_options)
Constructs a new dataset given the general options.
Definition: Dataset.cc:26
std::vector< std::string > m_variables
Vector of all variables (branch names) used in the training.
Definition: Options.h:86
std::vector< std::string > m_spectators
Vector of all spectators (branch names) used in the training.
Definition: Options.h:87
Dataset & m_dataset
Reference to the wrapped dataset.
Definition: Dataset.h:286
std::vector< unsigned int > m_feature_indices
Mapping from the position of a feature in the given subset to its position in the wrapped dataset.
Definition: Dataset.h:281
std::vector< unsigned int > m_spectator_indices
Mapping from the position of a spectator in the given subset to its position in the wrapped dataset.
Definition: Dataset.h:283
std::vector< unsigned int > m_event_indices
Mapping from the position of a event in the given subset to its position in the wrapped dataset.
Definition: Dataset.h:285
bool m_use_event_indices
Use only a subset of the wrapped dataset events.
Definition: Dataset.h:279

Member Function Documentation

◆ getFeature()

std::vector< float > getFeature ( unsigned int  iFeature)
overridevirtual

Returns all values of one feature in a std::vector<float> of the wrapped dataset.

Parameters
iFeaturethe position of the feature to return in the given subset

Reimplemented from Dataset.

Definition at line 245 of file Dataset.cc.

246 {
247
248 auto v = m_dataset.getFeature(m_feature_indices[iFeature]);
249 if (not m_use_event_indices)
250 return v;
251 std::vector<float> result(m_event_indices.size());
252 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
253 result[iEvent] = v[m_event_indices[iEvent]];
254 }
255 return result;
256
257 }
virtual std::vector< float > getFeature(unsigned int iFeature)
Returns all values of one feature in a std::vector<float>
Definition: Dataset.cc:74
virtual unsigned int getNumberOfEvents() const override
Returns the number of events in the wrapped dataset.
Definition: Dataset.h:258

◆ getFeatureIndex()

unsigned int getFeatureIndex ( const std::string &  feature)
virtualinherited

Return index of feature with the given name.

Parameters
featurename of the feature

Definition at line 50 of file Dataset.cc.

51 {
52
53 auto it = std::find(m_general_options.m_variables.begin(), m_general_options.m_variables.end(), feature);
54 if (it == m_general_options.m_variables.end()) {
55 B2ERROR("Unknown feature named " << feature);
56 return 0;
57 }
58 return std::distance(m_general_options.m_variables.begin(), it);
59
60 }

◆ getNumberOfEvents()

virtual unsigned int getNumberOfEvents ( ) const
inlineoverridevirtual

Returns the number of events in the wrapped dataset.

Implements Dataset.

Definition at line 258 of file Dataset.h.

virtual unsigned int getNumberOfEvents() const =0
Returns the number of events in this dataset.

◆ getNumberOfFeatures()

virtual unsigned int getNumberOfFeatures ( ) const
inlineoverridevirtual

Returns the number of features in this dataset, so the size of the given subset of the variables.

Implements Dataset.

Definition at line 248 of file Dataset.h.

248{ return m_feature_indices.size(); }

◆ getNumberOfSpectators()

virtual unsigned int getNumberOfSpectators ( ) const
inlineoverridevirtual

Returns the number of spectators in this dataset, so the size of the given subset of the spectators.

Implements Dataset.

Definition at line 253 of file Dataset.h.

253{ return m_spectator_indices.size(); }

◆ getSignalFraction()

float getSignalFraction ( )
virtualinherited

Returns the signal fraction of the whole sample.

Reimplemented in SPlotDataset.

Definition at line 35 of file Dataset.cc.

36 {
37
38 double signal_weight_sum = 0;
39 double weight_sum = 0;
40 for (unsigned int i = 0; i < getNumberOfEvents(); ++i) {
41 loadEvent(i);
42 weight_sum += m_weight;
43 if (m_isSignal)
44 signal_weight_sum += m_weight;
45 }
46 return signal_weight_sum / weight_sum;
47
48 }
virtual void loadEvent(unsigned int iEvent)=0
Load the event number iEvent.
bool m_isSignal
Defines if the currently loaded event is signal or background.
Definition: Dataset.h:127
float m_weight
Contains the weight of the currently loaded event.
Definition: Dataset.h:125

◆ getSignals()

std::vector< bool > getSignals ( )
virtualinherited

Returns all is Signals.

Reimplemented in ReweightingDataset.

Definition at line 122 of file Dataset.cc.

123 {
124
125 std::vector<bool> result(getNumberOfEvents());
126 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
127 loadEvent(iEvent);
128 result[iEvent] = m_isSignal;
129 }
130 return result;
131
132 }

◆ getSpectator()

std::vector< float > getSpectator ( unsigned int  iSpectator)
overridevirtual

Returns all values of one spectator in a std::vector<float> of the wrapped dataset.

Parameters
iSpectatorthe position of the spectator to return in the given subset

Reimplemented from Dataset.

Definition at line 259 of file Dataset.cc.

260 {
261
262 auto v = m_dataset.getSpectator(m_spectator_indices[iSpectator]);
263 if (not m_use_event_indices)
264 return v;
265 std::vector<float> result(m_event_indices.size());
266 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
267 result[iEvent] = v[m_event_indices[iEvent]];
268 }
269 return result;
270
271 }
virtual std::vector< float > getSpectator(unsigned int iSpectator)
Returns all values of one spectator in a std::vector<float>
Definition: Dataset.cc:86

◆ getSpectatorIndex()

unsigned int getSpectatorIndex ( const std::string &  spectator)
virtualinherited

Return index of spectator with the given name.

Parameters
spectatorname of the spectator

Definition at line 62 of file Dataset.cc.

63 {
64
65 auto it = std::find(m_general_options.m_spectators.begin(), m_general_options.m_spectators.end(), spectator);
66 if (it == m_general_options.m_spectators.end()) {
67 B2ERROR("Unknown spectator named " << spectator);
68 return 0;
69 }
70 return std::distance(m_general_options.m_spectators.begin(), it);
71
72 }

◆ getTargets()

std::vector< float > getTargets ( )
virtualinherited

Returns all targets.

Reimplemented in RegressionDataSet, and ReweightingDataset.

Definition at line 110 of file Dataset.cc.

111 {
112
113 std::vector<float> result(getNumberOfEvents());
114 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
115 loadEvent(iEvent);
116 result[iEvent] = m_target;
117 }
118 return result;
119
120 }
float m_target
Contains the target value of the currently loaded event.
Definition: Dataset.h:126

◆ getWeights()

std::vector< float > getWeights ( )
virtualinherited

Returns all weights.

Reimplemented in ROOTDataset, RegressionDataSet, and ReweightingDataset.

Definition at line 98 of file Dataset.cc.

99 {
100
101 std::vector<float> result(getNumberOfEvents());
102 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
103 loadEvent(iEvent);
104 result[iEvent] = m_weight;
105 }
106 return result;
107
108 }

◆ loadEvent()

void loadEvent ( unsigned int  iEvent)
overridevirtual

Load the event number iEvent from the wrapped dataset.

Parameters
iEventevent number to load

Implements Dataset.

Definition at line 225 of file Dataset.cc.

226 {
227 unsigned int index = iEvent;
229 index = m_event_indices[iEvent];
230 m_dataset.loadEvent(index);
234
235 for (unsigned int iFeature = 0; iFeature < m_input.size(); ++iFeature) {
236 m_input[iFeature] = m_dataset.m_input[m_feature_indices[iFeature]];
237 }
238
239 for (unsigned int iSpectator = 0; iSpectator < m_spectators.size(); ++iSpectator) {
240 m_spectators[iSpectator] = m_dataset.m_spectators[m_spectator_indices[iSpectator]];
241 }
242
243 }
std::vector< float > m_spectators
Contains all spectators values of the currently loaded event.
Definition: Dataset.h:124
std::vector< float > m_input
Contains all feature values of the currently loaded event.
Definition: Dataset.h:123

Member Data Documentation

◆ m_dataset

Dataset& m_dataset
private

Reference to the wrapped dataset.

Definition at line 286 of file Dataset.h.

◆ m_event_indices

std::vector<unsigned int> m_event_indices
private

Mapping from the position of a event in the given subset to its position in the wrapped dataset.

Definition at line 285 of file Dataset.h.

◆ m_feature_indices

std::vector<unsigned int> m_feature_indices
private

Mapping from the position of a feature in the given subset to its position in the wrapped dataset.

Definition at line 281 of file Dataset.h.

◆ m_general_options

GeneralOptions m_general_options
inherited

GeneralOptions passed to this dataset.

Definition at line 122 of file Dataset.h.

◆ m_input

std::vector<float> m_input
inherited

Contains all feature values of the currently loaded event.

Definition at line 123 of file Dataset.h.

◆ m_isSignal

bool m_isSignal
inherited

Defines if the currently loaded event is signal or background.

Definition at line 127 of file Dataset.h.

◆ m_spectator_indices

std::vector<unsigned int> m_spectator_indices
private

Mapping from the position of a spectator in the given subset to its position in the wrapped dataset.

Definition at line 283 of file Dataset.h.

◆ m_spectators

std::vector<float> m_spectators
inherited

Contains all spectators values of the currently loaded event.

Definition at line 124 of file Dataset.h.

◆ m_target

float m_target
inherited

Contains the target value of the currently loaded event.

Definition at line 126 of file Dataset.h.

◆ m_use_event_indices

bool m_use_event_indices = false
private

Use only a subset of the wrapped dataset events.

Definition at line 279 of file Dataset.h.

◆ m_weight

float m_weight
inherited

Contains the weight of the currently loaded event.

Definition at line 125 of file Dataset.h.


The documentation for this class was generated from the following files: