Belle II Software development
SidebandDataset Class Reference

Dataset for Sideband Subtraction Wraps a dataset and provides each data-point with a new weight. More...

#include <DataDriven.h>

Inheritance diagram for SidebandDataset:
Dataset

Public Member Functions

 SidebandDataset (const GeneralOptions &general_options, Dataset &dataset, Dataset &mc_dataset, const std::string &sideband_variable)
 Constructs a new SidebandDataset.
 
virtual unsigned int getNumberOfFeatures () const override
 Returns the number of features in this dataset.
 
virtual unsigned int getNumberOfSpectators () const override
 Returns the number of features in this dataset.
 
virtual unsigned int getNumberOfEvents () const override
 Returns the number of events in this dataset.
 
virtual std::vector< float > getFeature (unsigned int iFeature) override
 Returns all values of one feature in a std::vector<float>
 
virtual std::vector< float > getSpectator (unsigned int iSpectator) override
 Returns all values of one spectator in a std::vector<float>
 
virtual void loadEvent (unsigned int event) override
 Load the event number iEvent.
 
virtual float getSignalFraction ()
 Returns the signal fraction of the whole sample.
 
virtual unsigned int getFeatureIndex (const std::string &feature)
 Return index of feature with the given name.
 
virtual unsigned int getSpectatorIndex (const std::string &spectator)
 Return index of spectator with the given name.
 
virtual std::vector< float > getWeights ()
 Returns all weights.
 
virtual std::vector< float > getTargets ()
 Returns all targets.
 
virtual std::vector< bool > getSignals ()
 Returns all is Signals.
 

Public Attributes

GeneralOptions m_general_options
 GeneralOptions passed to this dataset.
 
std::vector< float > m_input
 Contains all feature values of the currently loaded event.
 
std::vector< float > m_spectators
 Contains all spectators values of the currently loaded event.
 
float m_weight
 Contains the weight of the currently loaded event.
 
float m_target
 Contains the target value of the currently loaded event.
 
bool m_isSignal
 Defines if the currently loaded event is signal or background.
 

Private Attributes

Datasetm_dataset
 Wrapped dataset.
 
int m_spectator_index
 spectator containing the sideband variable
 
double m_signal_weight
 the weight for signal events
 
double m_background_weight
 the weight for background events
 
double m_negative_signal_weight
 the weight for negative signal events
 

Detailed Description

Dataset for Sideband Subtraction Wraps a dataset and provides each data-point with a new weight.

Definition at line 104 of file DataDriven.h.

Constructor & Destructor Documentation

◆ SidebandDataset()

SidebandDataset ( const GeneralOptions general_options,
Dataset dataset,
Dataset mc_dataset,
const std::string &  sideband_variable 
)

Constructs a new SidebandDataset.

Parameters
general_optionsshared options defining the dataset (variables, ...)
datasetcontaining the data-points
mc_datasetcontaining mc
sideband_variabledefining the sideband regions

Definition at line 38 of file DataDriven.cc.

39 : Dataset(general_options), m_dataset(dataset)
40 {
41
42 m_spectator_index = dataset.getSpectatorIndex(sideband_variable);
43 int mc_spectator_index = mc_dataset.getSpectatorIndex(sideband_variable);
44
45 double total_signal_mc = 0.0;
46 double total_mc = 0.0;
47 double sum_signal_sr = 0.0;
48 double sum_sr = 0.0;
49 double sum_signal_br = 0.0;
50 double sum_br = 0.0;
51 double sum_signal_nr = 0.0;
52 double sum_nr = 0.0;
53
54 for (unsigned int iEvent = 0; iEvent < mc_dataset.getNumberOfEvents(); ++iEvent) {
55 mc_dataset.loadEvent(iEvent);
56 if (mc_dataset.m_isSignal)
57 total_signal_mc += mc_dataset.m_weight;
58 total_mc += mc_dataset.m_weight;
59 if (mc_dataset.m_spectators[mc_spectator_index] == 1.0) {
60 if (mc_dataset.m_isSignal)
61 sum_signal_sr += mc_dataset.m_weight;
62 sum_sr += mc_dataset.m_weight;
63 } else if (mc_dataset.m_spectators[mc_spectator_index] == 2.0) {
64 if (mc_dataset.m_isSignal)
65 sum_signal_br += mc_dataset.m_weight;
66 sum_br += mc_dataset.m_weight;
67 } else if (mc_dataset.m_spectators[mc_spectator_index] == 3.0) {
68 if (mc_dataset.m_isSignal)
69 sum_signal_nr += mc_dataset.m_weight;
70 sum_nr += mc_dataset.m_weight;
71 }
72 }
73
74 double total_data = 0.0;
75 double sum_data_sr = 0.0;
76 double sum_data_br = 0.0;
77 double sum_data_nr = 0.0;
78
79 for (unsigned int iEvent = 0; iEvent < dataset.getNumberOfEvents(); ++iEvent) {
80 dataset.loadEvent(iEvent);
81 total_data += dataset.m_weight;
82 if (dataset.m_spectators[m_spectator_index] == 1.0) {
83 sum_data_sr += dataset.m_weight;
84 } else if (dataset.m_spectators[m_spectator_index] == 2.0) {
85 sum_data_br += dataset.m_weight;
86 } else if (dataset.m_spectators[m_spectator_index] == 3.0) {
87 sum_data_nr += dataset.m_weight;
88 }
89 }
90
91 if (sum_signal_br / sum_br > 0.01) {
92 B2WARNING("The background region you defined in the sideband subtraction contains more than 1% signal");
93 }
94 if (sum_signal_nr / sum_nr > 0.01) {
95 B2WARNING("The negative signal region you defined in the sideband subtraction contains more than 1% signal");
96 }
97
98 if (sum_data_sr - sum_signal_sr < 0) {
99 B2ERROR("There is less data in the signal region than the expected amount of signal events in the signal region estimated from MC.");
100 }
101
102 if (total_data - total_signal_mc < 0) {
103 B2ERROR("There is less data than the expected amount of signal events estimated from MC.");
104 }
105
106 // We assume the number of signal events is correctly described in mc
107 // Everything else (like the background) we take from the data sample
108
109 // So Signal events in the signal region receive weight 1
110 m_signal_weight = 1.0;
111
112 // The background is scaled so that it corresponds to the total background in the whole sample
113 m_background_weight = (total_data - total_signal_mc) / sum_data_br;
114
115 // The negative signal is scaled so it corresponds to the expected background in the signal region:
116 // Background Events in Signal Region in Data = Total Events in Signal Region In Data - Signal Events in Signal Region in MC
117 m_negative_signal_weight = - (sum_data_sr - sum_signal_sr) / sum_data_nr;
118
119 B2INFO("Data " << total_data << " " << sum_data_sr << " " << sum_data_br << " " << sum_data_nr);
120 B2INFO("MC " << total_mc << " " << sum_sr << " " << sum_br << " " << sum_nr);
121 B2INFO("MC (signal)" << total_signal_mc << " " << sum_signal_sr << " " << sum_signal_br << " " << sum_signal_nr);
122 B2INFO("Sideband Subtraction: Signal Weight " << m_signal_weight << " Background Weight " << m_background_weight <<
123 " Negative Signal Weight " << m_negative_signal_weight);
124
125 }
Dataset(const GeneralOptions &general_options)
Constructs a new dataset given the general options.
Definition: Dataset.cc:26
Dataset & m_dataset
Wrapped dataset.
Definition: DataDriven.h:150
double m_negative_signal_weight
the weight for negative signal events
Definition: DataDriven.h:154
double m_signal_weight
the weight for signal events
Definition: DataDriven.h:152
double m_background_weight
the weight for background events
Definition: DataDriven.h:153
int m_spectator_index
spectator containing the sideband variable
Definition: DataDriven.h:151

Member Function Documentation

◆ getFeature()

virtual std::vector< float > getFeature ( unsigned int  iFeature)
inlineoverridevirtual

Returns all values of one feature in a std::vector<float>

Parameters
iFeaturethe position of the feature to return

Reimplemented from Dataset.

Definition at line 135 of file DataDriven.h.

135{ return m_dataset.getFeature(iFeature); }
virtual std::vector< float > getFeature(unsigned int iFeature)
Returns all values of one feature in a std::vector<float>
Definition: Dataset.cc:74

◆ getFeatureIndex()

unsigned int getFeatureIndex ( const std::string &  feature)
virtualinherited

Return index of feature with the given name.

Parameters
featurename of the feature

Definition at line 50 of file Dataset.cc.

51 {
52
53 auto it = std::find(m_general_options.m_variables.begin(), m_general_options.m_variables.end(), feature);
54 if (it == m_general_options.m_variables.end()) {
55 B2ERROR("Unknown feature named " << feature);
56 return 0;
57 }
58 return std::distance(m_general_options.m_variables.begin(), it);
59
60 }
GeneralOptions m_general_options
GeneralOptions passed to this dataset.
Definition: Dataset.h:122
std::vector< std::string > m_variables
Vector of all variables (branch names) used in the training.
Definition: Options.h:86

◆ getNumberOfEvents()

virtual unsigned int getNumberOfEvents ( ) const
inlineoverridevirtual

Returns the number of events in this dataset.

Implements Dataset.

Definition at line 129 of file DataDriven.h.

129{ return m_dataset.getNumberOfEvents(); };
virtual unsigned int getNumberOfEvents() const =0
Returns the number of events in this dataset.

◆ getNumberOfFeatures()

virtual unsigned int getNumberOfFeatures ( ) const
inlineoverridevirtual

Returns the number of features in this dataset.

Implements Dataset.

Definition at line 119 of file DataDriven.h.

119{ return m_dataset.getNumberOfFeatures(); }
virtual unsigned int getNumberOfFeatures() const =0
Returns the number of features in this dataset.

◆ getNumberOfSpectators()

virtual unsigned int getNumberOfSpectators ( ) const
inlineoverridevirtual

Returns the number of features in this dataset.

Implements Dataset.

Definition at line 124 of file DataDriven.h.

virtual unsigned int getNumberOfSpectators() const =0
Returns the number of spectators in this dataset.

◆ getSignalFraction()

float getSignalFraction ( )
virtualinherited

Returns the signal fraction of the whole sample.

Reimplemented in SPlotDataset.

Definition at line 35 of file Dataset.cc.

36 {
37
38 double signal_weight_sum = 0;
39 double weight_sum = 0;
40 for (unsigned int i = 0; i < getNumberOfEvents(); ++i) {
41 loadEvent(i);
42 weight_sum += m_weight;
43 if (m_isSignal)
44 signal_weight_sum += m_weight;
45 }
46 return signal_weight_sum / weight_sum;
47
48 }
virtual void loadEvent(unsigned int iEvent)=0
Load the event number iEvent.
bool m_isSignal
Defines if the currently loaded event is signal or background.
Definition: Dataset.h:127
float m_weight
Contains the weight of the currently loaded event.
Definition: Dataset.h:125

◆ getSignals()

std::vector< bool > getSignals ( )
virtualinherited

Returns all is Signals.

Reimplemented in ReweightingDataset.

Definition at line 122 of file Dataset.cc.

123 {
124
125 std::vector<bool> result(getNumberOfEvents());
126 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
127 loadEvent(iEvent);
128 result[iEvent] = m_isSignal;
129 }
130 return result;
131
132 }

◆ getSpectator()

virtual std::vector< float > getSpectator ( unsigned int  iSpectator)
inlineoverridevirtual

Returns all values of one spectator in a std::vector<float>

Parameters
iSpectatorthe position of the feature to return

Reimplemented from Dataset.

Definition at line 141 of file DataDriven.h.

141{ return m_dataset.getSpectator(iSpectator); }
virtual std::vector< float > getSpectator(unsigned int iSpectator)
Returns all values of one spectator in a std::vector<float>
Definition: Dataset.cc:86

◆ getSpectatorIndex()

unsigned int getSpectatorIndex ( const std::string &  spectator)
virtualinherited

Return index of spectator with the given name.

Parameters
spectatorname of the spectator

Definition at line 62 of file Dataset.cc.

63 {
64
65 auto it = std::find(m_general_options.m_spectators.begin(), m_general_options.m_spectators.end(), spectator);
66 if (it == m_general_options.m_spectators.end()) {
67 B2ERROR("Unknown spectator named " << spectator);
68 return 0;
69 }
70 return std::distance(m_general_options.m_spectators.begin(), it);
71
72 }
std::vector< std::string > m_spectators
Vector of all spectators (branch names) used in the training.
Definition: Options.h:87

◆ getTargets()

std::vector< float > getTargets ( )
virtualinherited

Returns all targets.

Reimplemented in RegressionDataSet, and ReweightingDataset.

Definition at line 110 of file Dataset.cc.

111 {
112
113 std::vector<float> result(getNumberOfEvents());
114 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
115 loadEvent(iEvent);
116 result[iEvent] = m_target;
117 }
118 return result;
119
120 }
float m_target
Contains the target value of the currently loaded event.
Definition: Dataset.h:126

◆ getWeights()

std::vector< float > getWeights ( )
virtualinherited

Returns all weights.

Reimplemented in ROOTDataset, RegressionDataSet, and ReweightingDataset.

Definition at line 98 of file Dataset.cc.

99 {
100
101 std::vector<float> result(getNumberOfEvents());
102 for (unsigned int iEvent = 0; iEvent < getNumberOfEvents(); ++iEvent) {
103 loadEvent(iEvent);
104 result[iEvent] = m_weight;
105 }
106 return result;
107
108 }

◆ loadEvent()

void loadEvent ( unsigned int  event)
overridevirtual

Load the event number iEvent.

Parameters
eventevent number to load

Implements Dataset.

Definition at line 127 of file DataDriven.cc.

128 {
129 m_dataset.loadEvent(event);
133 if (m_spectators[m_spectator_index] == 1.0) {
134 m_isSignal = true;
135 m_target = 1.0;
137 } else if (m_spectators[m_spectator_index] == 2.0) {
138 m_isSignal = false;
139 m_target = 0.0;
141 } else if (m_spectators[m_spectator_index] == 3.0) {
142 m_isSignal = true;
143 m_target = 1.0;
145 } else {
146 m_isSignal = false;
147 m_target = 0.0;
148 m_weight = 0.0;
149 }
150 }
std::vector< float > m_spectators
Contains all spectators values of the currently loaded event.
Definition: Dataset.h:124
std::vector< float > m_input
Contains all feature values of the currently loaded event.
Definition: Dataset.h:123

Member Data Documentation

◆ m_background_weight

double m_background_weight
private

the weight for background events

Definition at line 153 of file DataDriven.h.

◆ m_dataset

Dataset& m_dataset
private

Wrapped dataset.

Definition at line 150 of file DataDriven.h.

◆ m_general_options

GeneralOptions m_general_options
inherited

GeneralOptions passed to this dataset.

Definition at line 122 of file Dataset.h.

◆ m_input

std::vector<float> m_input
inherited

Contains all feature values of the currently loaded event.

Definition at line 123 of file Dataset.h.

◆ m_isSignal

bool m_isSignal
inherited

Defines if the currently loaded event is signal or background.

Definition at line 127 of file Dataset.h.

◆ m_negative_signal_weight

double m_negative_signal_weight
private

the weight for negative signal events

Definition at line 154 of file DataDriven.h.

◆ m_signal_weight

double m_signal_weight
private

the weight for signal events

Definition at line 152 of file DataDriven.h.

◆ m_spectator_index

int m_spectator_index
private

spectator containing the sideband variable

Definition at line 151 of file DataDriven.h.

◆ m_spectators

std::vector<float> m_spectators
inherited

Contains all spectators values of the currently loaded event.

Definition at line 124 of file Dataset.h.

◆ m_target

float m_target
inherited

Contains the target value of the currently loaded event.

Definition at line 126 of file Dataset.h.

◆ m_weight

float m_weight
inherited

Contains the weight of the currently loaded event.

Definition at line 125 of file Dataset.h.


The documentation for this class was generated from the following files: