Belle II Software light-2406-ragdoll
PythonTeacher Class Reference

Teacher for the Python MVA method. More...

#include <Python.h>

Inheritance diagram for PythonTeacher:
Collaboration diagram for PythonTeacher:

Public Member Functions

 PythonTeacher (const GeneralOptions &general_options, const PythonOptions &specific_options)
 Constructs a new teacher using the GeneralOptions and specific options of this training.
 
virtual Weightfile train (Dataset &training_data) const override
 Train a mva method using the given dataset returning a Weightfile.
 

Protected Attributes

GeneralOptions m_general_options
 GeneralOptions containing all shared options.
 

Private Attributes

PythonOptions m_specific_options
 Method specific options.
 

Detailed Description

Teacher for the Python MVA method.

Definition at line 90 of file Python.h.

Constructor & Destructor Documentation

◆ PythonTeacher()

PythonTeacher ( const GeneralOptions general_options,
const PythonOptions specific_options 
)

Constructs a new teacher using the GeneralOptions and specific options of this training.

Parameters
general_optionsdefining all shared options
specific_optionsdefininf all method specific options

Definition at line 151 of file Python.cc.

152 : Teacher(general_options),
153 m_specific_options(specific_options)
154 {
156 }
static PythonInitializerSingleton & GetInstance()
Return static instance of PythonInitializerSingleton.
Definition: Python.cc:144
PythonOptions m_specific_options
Method specific options.
Definition: Python.h:107
Teacher(const GeneralOptions &general_options)
Constructs a new teacher using the GeneralOptions for this training.
Definition: Teacher.cc:18

Member Function Documentation

◆ train()

Weightfile train ( Dataset training_data) const
overridevirtual

Train a mva method using the given dataset returning a Weightfile.

Parameters
training_dataused to train the method

Implements Teacher.

Definition at line 159 of file Python.cc.

160 {
161
162 Weightfile weightfile;
163 std::string custom_weightfile = weightfile.generateFileName();
164 std::string custom_steeringfile = weightfile.generateFileName();
165
166 uint64_t numberOfFeatures = training_data.getNumberOfFeatures();
167 uint64_t numberOfSpectators = training_data.getNumberOfSpectators();
168 uint64_t numberOfEvents = training_data.getNumberOfEvents();
169
171 B2ERROR("Please provide a positive training fraction");
172 throw std::runtime_error("Please provide a training fraction between (0.0,1.0]");
173 }
174
175 auto numberOfTrainingEvents = static_cast<uint64_t>(numberOfEvents * 100 * m_specific_options.m_training_fraction);
176 numberOfTrainingEvents = numberOfTrainingEvents / 100 + (numberOfTrainingEvents % 100 != 0);
177 auto numberOfValidationEvents = numberOfEvents - numberOfTrainingEvents;
178
179 uint64_t batch_size = m_specific_options.m_mini_batch_size;
180 if (batch_size == 0) {
181 batch_size = numberOfTrainingEvents;
182 }
183
184 if (batch_size > numberOfTrainingEvents) {
185 B2WARNING("Mini batch size (" << batch_size << ") is larger than the number of training events (" << numberOfTrainingEvents << ")"\
186 " The batch size has been set equal to the number of training events.");
187 batch_size = numberOfTrainingEvents;
188 };
189
190 auto X = std::unique_ptr<float[]>(new float[batch_size * numberOfFeatures]);
191 auto S = std::unique_ptr<float[]>(new float[batch_size * numberOfSpectators]);
192 auto y = std::unique_ptr<float[]>(new float[batch_size]);
193 auto w = std::unique_ptr<float[]>(new float[batch_size]);
194 npy_intp dimensions_X[2] = {static_cast<npy_intp>(batch_size), static_cast<npy_intp>(numberOfFeatures)};
195 npy_intp dimensions_S[2] = {static_cast<npy_intp>(batch_size), static_cast<npy_intp>(numberOfSpectators)};
196 npy_intp dimensions_y[2] = {static_cast<npy_intp>(batch_size), 1};
197 npy_intp dimensions_w[2] = {static_cast<npy_intp>(batch_size), 1};
198
199 auto X_v = std::unique_ptr<float[]>(new float[numberOfValidationEvents * numberOfFeatures]);
200 auto S_v = std::unique_ptr<float[]>(new float[numberOfValidationEvents * numberOfSpectators]);
201 auto y_v = std::unique_ptr<float[]>(new float[numberOfValidationEvents]);
202 auto w_v = std::unique_ptr<float[]>(new float[numberOfValidationEvents]);
203 npy_intp dimensions_X_v[2] = {static_cast<npy_intp>(numberOfValidationEvents), static_cast<npy_intp>(numberOfFeatures)};
204 npy_intp dimensions_S_v[2] = {static_cast<npy_intp>(numberOfValidationEvents), static_cast<npy_intp>(numberOfSpectators)};
205 npy_intp dimensions_y_v[2] = {static_cast<npy_intp>(numberOfValidationEvents), 1};
206 npy_intp dimensions_w_v[2] = {static_cast<npy_intp>(numberOfValidationEvents), 1};
207
208 std::string steering_file_source_code;
211 std::ifstream steering_file(filename);
212 if (not steering_file) {
213 throw std::runtime_error(std::string("Couldn't open file ") + filename);
214 }
215 steering_file.seekg(0, std::ios::end);
216 steering_file_source_code.resize(steering_file.tellg());
217 steering_file.seekg(0, std::ios::beg);
218 steering_file.read(&steering_file_source_code[0], steering_file_source_code.size());
219 }
220
221 std::vector<float> means(numberOfFeatures, 0.0);
222 std::vector<float> stds(numberOfFeatures, 0.0);
223
225 // Stable calculation of mean and variance with weights
226 // see https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
227 auto weights = training_data.getWeights();
228 for (uint64_t iFeature = 0; iFeature < numberOfFeatures; ++iFeature) {
229 double wSum = 0.0;
230 double mean = 0.0;
231 double running_std = 0.0;
232 auto feature = training_data.getFeature(iFeature);
233 for (uint64_t i = 0; i < weights.size(); ++i) {
234 wSum += weights[i];
235 double meanOld = mean;
236 mean += (weights[i] / wSum) * (feature[i] - meanOld);
237 running_std += weights[i] * (feature[i] - meanOld) * (feature[i] - mean);
238 }
239 means[iFeature] = mean;
240 stds[iFeature] = std::sqrt(running_std / (wSum - 1));
241 }
242 }
243
244 try {
245 // Load python modules
246 auto json = boost::python::import("json");
247 auto builtins = boost::python::import("builtins");
248 auto inspect = boost::python::import("inspect");
249
250 // Load framework
251 auto framework = boost::python::import((std::string("basf2_mva_python_interface.") + m_specific_options.m_framework).c_str());
252 // Overwrite framework with user-defined code from the steering file
253 builtins.attr("exec")(steering_file_source_code.c_str(), boost::python::object(framework.attr("__dict__")));
254
255 // Call get_model with the parameters provided by the user
256 auto parameters = json.attr("loads")(m_specific_options.m_config.c_str());
257 auto model = framework.attr("get_model")(numberOfFeatures, numberOfSpectators,
258 numberOfEvents, m_specific_options.m_training_fraction, parameters);
259
260 // Call begin_fit with validation sample
261 for (uint64_t iEvent = 0; iEvent < numberOfValidationEvents; ++iEvent) {
262 training_data.loadEvent(iEvent);
264 for (uint64_t iFeature = 0; iFeature < numberOfFeatures; ++iFeature)
265 X_v[iEvent * numberOfFeatures + iFeature] = (training_data.m_input[iFeature] - means[iFeature]) / stds[iFeature];
266 } else {
267 for (uint64_t iFeature = 0; iFeature < numberOfFeatures; ++iFeature)
268 X_v[iEvent * numberOfFeatures + iFeature] = training_data.m_input[iFeature];
269 }
270 for (uint64_t iSpectator = 0; iSpectator < numberOfSpectators; ++iSpectator)
271 S_v[iEvent * numberOfSpectators + iSpectator] = training_data.m_spectators[iSpectator];
272 y_v[iEvent] = training_data.m_target;
273 w_v[iEvent] = training_data.m_weight;
274 }
275
276 auto ndarray_X_v = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_X_v, NPY_FLOAT32, X_v.get()));
277 auto ndarray_S_v = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_S_v, NPY_FLOAT32, S_v.get()));
278 auto ndarray_y_v = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_y_v, NPY_FLOAT32, y_v.get()));
279 auto ndarray_w_v = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_w_v, NPY_FLOAT32, w_v.get()));
280
281 uint64_t nBatches = std::floor(numberOfTrainingEvents / batch_size);
282
283 auto state = framework.attr("begin_fit")(model, ndarray_X_v, ndarray_S_v, ndarray_y_v, ndarray_w_v, nBatches);
284
285 bool continue_loop = true;
286
287 std::vector<uint64_t> iteration_index_vector(numberOfTrainingEvents);
288 std::iota(std::begin(iteration_index_vector), std::end(iteration_index_vector), 0);
289
290 for (uint64_t iIteration = 0; (iIteration < m_specific_options.m_nIterations or m_specific_options.m_nIterations == 0)
291 and continue_loop; ++iIteration) {
292
293 // shuffle the indices on each iteration to get randomised batches
294 if (iIteration > 0) std::shuffle(std::begin(iteration_index_vector), std::end(iteration_index_vector), TRandomWrapper());
295
296 for (uint64_t iBatch = 0; iBatch < nBatches and continue_loop; ++iBatch) {
297
298 // Release Global Interpreter Lock in python to allow multithreading while reading root files
299 // also see: https://docs.python.org/3.5/c-api/init.html
300 PyThreadState* m_thread_state = PyEval_SaveThread();
301 for (uint64_t iEvent = 0; iEvent < batch_size; ++iEvent) {
302 training_data.loadEvent(iteration_index_vector.at(iEvent + iBatch * batch_size) + numberOfValidationEvents);
304 for (uint64_t iFeature = 0; iFeature < numberOfFeatures; ++iFeature)
305 X[iEvent * numberOfFeatures + iFeature] = (training_data.m_input[iFeature] - means[iFeature]) / stds[iFeature];
306 } else {
307 for (uint64_t iFeature = 0; iFeature < numberOfFeatures; ++iFeature)
308 X[iEvent * numberOfFeatures + iFeature] = training_data.m_input[iFeature];
309 }
310 for (uint64_t iSpectator = 0; iSpectator < numberOfSpectators; ++iSpectator)
311 S[iEvent * numberOfSpectators + iSpectator] = training_data.m_spectators[iSpectator];
312 y[iEvent] = training_data.m_target;
313 w[iEvent] = training_data.m_weight;
314 }
315
316 // Maybe slow, create ndarrays outside of loop?
317 auto ndarray_X = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_X, NPY_FLOAT32, X.get()));
318 auto ndarray_S = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_S, NPY_FLOAT32, S.get()));
319 auto ndarray_y = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_y, NPY_FLOAT32, y.get()));
320 auto ndarray_w = boost::python::handle<>(PyArray_SimpleNewFromData(2, dimensions_w, NPY_FLOAT32, w.get()));
321
322 // Reactivate Global Interpreter Lock to safely execute python code
323 PyEval_RestoreThread(m_thread_state);
324 auto r = framework.attr("partial_fit")(state, ndarray_X, ndarray_S, ndarray_y,
325 ndarray_w, iIteration, iBatch);
326 boost::python::extract<bool> proxy(r);
327 if (proxy.check())
328 continue_loop = static_cast<bool>(proxy);
329 }
330 }
331
332 auto result = framework.attr("end_fit")(state);
333
334 auto pickle = boost::python::import("pickle");
335 auto file = builtins.attr("open")(custom_weightfile.c_str(), "wb");
336 pickle.attr("dump")(result, file);
337
338 auto steeringfile = builtins.attr("open")(custom_steeringfile.c_str(), "wb");
339 pickle.attr("dump")(steering_file_source_code.c_str(), steeringfile);
340
341 auto importances = framework.attr("feature_importance")(state);
342 if (len(importances) == 0) {
343 B2INFO("Python method returned empty feature importance. There won't be any information about the feature importance in the weightfile.");
344 } else if (numberOfFeatures != static_cast<uint64_t>(len(importances))) {
345 B2WARNING("Python method didn't return the correct number of importance value. I ignore the importances");
346 } else {
347 std::map<std::string, float> feature_importances;
348 for (uint64_t iFeature = 0; iFeature < numberOfFeatures; ++iFeature) {
349 boost::python::extract<float> proxy(importances[iFeature]);
350 if (proxy.check()) {
351 feature_importances[m_general_options.m_variables[iFeature]] = static_cast<float>(proxy);
352 } else {
353 B2WARNING("Failed to convert importance output of the method to a float, using 0 instead");
354 feature_importances[m_general_options.m_variables[iFeature]] = 0.0;
355 }
356 }
357 weightfile.addFeatureImportance(feature_importances);
358 }
359
360 } catch (...) {
361 PyErr_Print();
362 PyErr_Clear();
363 B2ERROR("Failed calling train in PythonTeacher");
364 throw std::runtime_error(std::string("Failed calling train in PythonTeacher"));
365 }
366
367 weightfile.addOptions(m_general_options);
368 weightfile.addOptions(m_specific_options);
369 weightfile.addFile("Python_Weightfile", custom_weightfile);
370 weightfile.addFile("Python_Steeringfile", custom_steeringfile);
371 weightfile.addSignalFraction(training_data.getSignalFraction());
373 weightfile.addVector("Python_Means", means);
374 weightfile.addVector("Python_Stds", stds);
375 }
376
377 return weightfile;
378
379 }
static std::string findFile(const std::string &path, bool silent=false)
Search for given file or directory in local or central release directory, and return absolute path if...
Definition: FileSystem.cc:151
std::vector< std::string > m_variables
Vector of all variables (branch names) used in the training.
Definition: Options.h:86
unsigned int m_nIterations
Number of iterations through the whole data.
Definition: Python.h:81
std::string m_steering_file
steering file provided by the user to override the functions in the framework
Definition: Python.h:78
std::string m_framework
framework to use e.g.
Definition: Python.h:77
std::string m_config
Config string in json, which is passed to the get model function.
Definition: Python.h:79
bool m_normalize
Normalize the inputs (shift mean to zero and std to 1)
Definition: Python.h:83
double m_training_fraction
Fraction of data passed as training data, rest is passed as test data.
Definition: Python.h:82
unsigned int m_mini_batch_size
Mini batch size, 0 passes the whole data in one call.
Definition: Python.h:80
GeneralOptions m_general_options
GeneralOptions containing all shared options.
Definition: Teacher.h:49

Member Data Documentation

◆ m_general_options

GeneralOptions m_general_options
protectedinherited

GeneralOptions containing all shared options.

Definition at line 49 of file Teacher.h.

◆ m_specific_options

PythonOptions m_specific_options
private

Method specific options.

Definition at line 107 of file Python.h.


The documentation for this class was generated from the following files: