Belle II Software development
MinMaxCollector< DataType > Class Template Reference

A container for collecting data, where min- and max-quantiles near q(0) and q(1) are to be found. More...

#include <MinMaxCollector.h>

Public Member Functions

 BELLE2_DEFINE_EXCEPTION (Quantile_out_of_bounds, "The quantileCut (%1%) you gave is illegal (only allowed between 0-1)")
 returns true, if the valueContainers were increased in size.
 
 BELLE2_DEFINE_EXCEPTION (Illegal_quantile, "The quantiles you asked for (%1% and %2%) are not within the collected range of data (0-%3% and %4%-1) to prevent this happening, you have to pass a bigger value for QuantileCut when constructing a MinMaxCollector-Instance!")
 exception shall be thrown if the requested quantiles are not within the ranges collected
 
 BELLE2_DEFINE_EXCEPTION (Request_in_empty_Container, "Data of an empty container was requested!")
 exception shall be thrown if value is not between 0-1 and therefore not normalized
 
 MinMaxCollector (DataType quantileCut=0.025, unsigned warmUpThreshold=10)
 constructor.
 
std::pair< DataType, DataType > getMinMax (DataType minQuantile=0., DataType maxQuantile=1.) const
 for given pair of quantiles, the according cuts (min, max) will be returned.
 
void insert (DataType newVal)
 for convenience reasons, pipe to append.
 
void push_back (DataType newVal)
 for convenience reasons, pipe to append.
 
void append (DataType newVal)
 append new value
 
void merge (const MinMaxCollector< DataType > &other)
 fill the stuff of the other one with this one

 
unsigned totalSize () const
 returns the combined size of the containers storing the values
 
unsigned size () const
 returns the size (in a sense of roughly collected data)
 
bool empty () const
 returns if internal containers are empty
 
void clear ()
 deletes all values collected so far and resets to constructor-settings.
 
unsigned sampleSize () const
 returns actual sampleSize
 
void print (bool printFull=false) const
 print an overview of the entries collected.
 
std::string getName (bool printFull=false) const
 return a string of an overview of the entries collected.
 

Protected Member Functions

bool addMin (DataType newVal, bool allow2Grow)
 add entry to minContainer if it fits in
 
bool addMax (DataType newVal, bool allow2Grow)
 add entry to maxContainer if it fits in
 
bool addEntry (DataType newVal, bool isMinContainer, bool allow2Grow)
 add entry to container if it fits in
 
void sortIn (DataType newVal, bool isMinContainer)
 add newVal to appropriate container
 
unsigned getIndex (DataType aQuantile) const
 the correct access-index for given quantile will be determined
 
bool checkVectorSize (const std::deque< DataType > &container)
 returns true, if vector is allowed to grow
 

Protected Attributes

std::deque< DataType > m_smallestValues
 collects smallest values occurred so far
 
std::deque< DataType > m_biggestValues
 collects biggest values occurred so far
 
unsigned m_sampleSize
 counts number of values added so far
 
DataType m_quantileCut
 sets the threshold for storing data.
 
unsigned m_warmUpThreshold
 sets a threshold for warm-up.
 

Friends

std::ostream & operator<< (std::ostream &out, const MinMaxCollector &mmCol)
 overloaded '<<' stream operator.
 

Detailed Description

template<class DataType>
class Belle2::MinMaxCollector< DataType >

A container for collecting data, where min- and max-quantiles near q(0) and q(1) are to be found.

can only approximately find actual quantiles and is sensitive to sorted data . -> random samples have to be used! Always exact for min and max, but the nearer the requested quantiles to the given quantileCut, the more sensitive it is to sorted data. For random input, the quantiles always stay very near to the real ones (if they are not the correct ones already). Prerequisites for DataType:

  • has to be one supported by std::numeric_limits

Definition at line 42 of file MinMaxCollector.h.

Constructor & Destructor Documentation

◆ MinMaxCollector()

MinMaxCollector ( DataType  quantileCut = 0.025,
unsigned  warmUpThreshold = 10 
)
inline

constructor.

quantileCut determines the fraction of the sample to be stored (0-1). warmUpThreshold pays some overhead for sample sizes to get more accuracy (for bigger sample sizes the overhead and the advantage of warmUp vanish (since then the results converge to the real ones anyway).

Definition at line 166 of file MinMaxCollector.h.

166 :
167 m_sampleSize(0),
168 m_quantileCut(quantileCut),
169 m_warmUpThreshold(warmUpThreshold)
170 {
171 if (0 > quantileCut or 0.5 < quantileCut) { throw (Quantile_out_of_bounds() << quantileCut); }
172 }
DataType m_quantileCut
sets the threshold for storing data.
unsigned m_warmUpThreshold
sets a threshold for warm-up.
unsigned m_sampleSize
counts number of values added so far

Member Function Documentation

◆ addEntry()

bool addEntry ( DataType  newVal,
bool  isMinContainer,
bool  allow2Grow 
)
inlineprotected

add entry to container if it fits in

Definition at line 74 of file MinMaxCollector.h.

75 {
76 if (isMinContainer) {
77 if (m_smallestValues.empty() or allow2Grow) { sortIn(newVal, true); return true; }
78 if (m_smallestValues.back() > newVal) {
79 m_smallestValues.pop_back();
80 sortIn(newVal, true);
81 return true;
82 }
83 } else {
84 if (m_biggestValues.empty() or allow2Grow) { sortIn(newVal, false); return true; }
85 if (m_biggestValues.front() < newVal) {
86 m_biggestValues.pop_front();
87 sortIn(newVal, false);
88 return true;
89 }
90 }
91 return false;
92 }
std::deque< DataType > m_biggestValues
collects biggest values occurred so far
std::deque< DataType > m_smallestValues
collects smallest values occurred so far
void sortIn(DataType newVal, bool isMinContainer)
add newVal to appropriate container

◆ addMax()

bool addMax ( DataType  newVal,
bool  allow2Grow 
)
inlineprotected

add entry to maxContainer if it fits in

Definition at line 69 of file MinMaxCollector.h.

69{ return addEntry(newVal, false, allow2Grow); }
bool addEntry(DataType newVal, bool isMinContainer, bool allow2Grow)
add entry to container if it fits in

◆ addMin()

bool addMin ( DataType  newVal,
bool  allow2Grow 
)
inlineprotected

add entry to minContainer if it fits in

Definition at line 64 of file MinMaxCollector.h.

64{ return addEntry(newVal, true, allow2Grow); }

◆ append()

void append ( DataType  newVal)
inline

append new value

Definition at line 217 of file MinMaxCollector.h.

218 {
219 m_sampleSize++;
220
221 if (m_sampleSize < m_warmUpThreshold) { // to shorten warm-up-phase
222 m_smallestValues.push_back(newVal);
223 std::sort(m_smallestValues.begin(), m_smallestValues.end());
224 m_biggestValues.push_back(newVal);
225 std::sort(m_biggestValues.begin(), m_biggestValues.end());
226 return;
227 }
228
229 /*bool wasAdded =*/ addMin(newVal, checkVectorSize(m_smallestValues));
230 /*if (wasAdded == false)*/ addMax(newVal, checkVectorSize(m_biggestValues));
231 }
bool checkVectorSize(const std::deque< DataType > &container)
returns true, if vector is allowed to grow
bool addMax(DataType newVal, bool allow2Grow)
add entry to maxContainer if it fits in
bool addMin(DataType newVal, bool allow2Grow)
add entry to minContainer if it fits in

◆ BELLE2_DEFINE_EXCEPTION()

BELLE2_DEFINE_EXCEPTION ( Quantile_out_of_bounds  ,
"The quantileCut (%1%) you gave is illegal (only allowed between 0-1)"   
)

returns true, if the valueContainers were increased in size.

exception shall be thrown if value is not between 0-1 and therefore not normalized

◆ checkVectorSize()

bool checkVectorSize ( const std::deque< DataType > &  container)
inlineprotected

returns true, if vector is allowed to grow

Definition at line 123 of file MinMaxCollector.h.

124 {
125 // want to allow growing with some extra margin to prevent issues
126 unsigned newCalcThreshold = unsigned(ceil(float(m_sampleSize) * float(m_quantileCut) + ceil(0.01 * float(m_sampleSize))));
127
128 if (newCalcThreshold > container.size() /*+1*/) { return true; }
129 return false;
130 }

◆ clear()

void clear ( )
inline

deletes all values collected so far and resets to constructor-settings.

Definition at line 271 of file MinMaxCollector.h.

272 {
273 m_sampleSize = 0;
274 m_smallestValues.clear();
275 m_biggestValues.clear();
276 }

◆ empty()

bool empty ( ) const
inline

returns if internal containers are empty

Definition at line 266 of file MinMaxCollector.h.

266{ return (m_smallestValues.empty() and m_biggestValues.empty()); }

◆ getIndex()

unsigned getIndex ( DataType  aQuantile) const
inlineprotected

the correct access-index for given quantile will be determined

Definition at line 116 of file MinMaxCollector.h.

118 { return unsigned(double(m_sampleSize/*-1*/) * double(aQuantile) + 0.5); }

◆ getMinMax()

std::pair< DataType, DataType > getMinMax ( DataType  minQuantile = 0.,
DataType  maxQuantile = 1. 
) const
inline

for given pair of quantiles, the according cuts (min, max) will be returned.

Definition at line 182 of file MinMaxCollector.h.

183 {
184 if (m_biggestValues.empty() or m_smallestValues.empty()) { throw (Request_in_empty_Container()); }
185 if (0 > minQuantile or 1 < minQuantile) { throw (Quantile_out_of_bounds() << minQuantile); }
186 if (0 > maxQuantile or 1 < maxQuantile) { throw (Quantile_out_of_bounds() << maxQuantile); }
187 if (minQuantile > m_quantileCut or maxQuantile < (1. - m_quantileCut)) {
188 throw (Illegal_quantile() << minQuantile << maxQuantile << m_quantileCut << 1. - m_quantileCut);
189 }
190
191 unsigned minIndex = getIndex(minQuantile);
192// B2INFO("minIndex: " << minIndex);
193 unsigned maxIndex = getIndex(1. - maxQuantile);
194// B2INFO("maxIndex: " << maxIndex);
195 unsigned finalMaxIndex = ((int(m_biggestValues.size()) - int(maxIndex) - 1) < 0) ? 0 : m_biggestValues.size() - 1 - maxIndex;
196
197 if (minIndex > (m_smallestValues.size() - 1)) { B2ERROR("minIndex " << minIndex << " calculated for minQuantile " << minQuantile << " bigger than nSmallestValues " << m_smallestValues.size() << "!"); }
198 if (finalMaxIndex > (m_biggestValues.size() - 1)) { B2ERROR("maxIndex " << maxIndex << " calculated for maxQuantile " << maxQuantile << " bigger than nBiggestValues " << m_biggestValues.size() << "!"); }
199 return {m_smallestValues.at(minIndex), m_biggestValues.at(finalMaxIndex)};
200 }
unsigned getIndex(DataType aQuantile) const
the correct access-index for given quantile will be determined

◆ getName()

std::string getName ( bool  printFull = false) const
inline

return a string of an overview of the entries collected.

if parameter given is 'true' full print will be done, if false, the values will be sketched with means.

Definition at line 290 of file MinMaxCollector.h.

291 {
292 unsigned nSmallest = m_smallestValues.size();
293 unsigned nBiggest = m_biggestValues.size();
294 using namespace std;
295 string out = "MinMaxCollector with sampleSize " + to_string(m_sampleSize) +
296 " and quantileCut " + to_string(m_quantileCut) +
297 " has nSmallestValues/nBiggestValues: " + to_string(nSmallest) +
298 "/" + to_string(nBiggest) + "\n";
299
300 if (!printFull) out += "The 5 values each describe for the valueContainer pos[0], pos[1], pos[mean], pos[max-1], pos[max]\n";
301 out += "SmallestValues: ";
302 // only want to print full vector if there are not many entries in it:
303 if (printFull or size() < 6) {
304 for (DataType entry : m_smallestValues) { out += to_string(entry) + ", "; }
305 out += "\n" + string("BiggestValues: ");
306 for (DataType entry : m_biggestValues) { out += to_string(entry) + ", "; }
307 out += "\n";
308 } else {
309 DataType smallestTotal = 0, biggestTotal = 0, smallestMean, biggestMean;
310 for (DataType entry : m_smallestValues) { smallestTotal += entry; }
311 smallestMean = smallestTotal / DataType(nSmallest);
312 out += to_string(m_smallestValues.at(0))
313 + ", " + to_string(m_smallestValues.at(1))
314 + ", mean: " + to_string(smallestMean)
315 + ", " + to_string(m_smallestValues.at(nSmallest - 2))
316 + ", " + to_string(m_smallestValues.at(nSmallest - 1));
317 for (DataType entry : m_biggestValues) { biggestTotal += entry; }
318 biggestMean = biggestTotal / DataType(nBiggest);
319 out += "\n" + string("BiggestValues: ");
320 out += to_string(m_biggestValues.at(0))
321 + ", " + to_string(m_biggestValues.at(1))
322 + ", mean: " + to_string(biggestMean)
323 + ", " + to_string(m_biggestValues.at(nBiggest - 2))
324 + ", " + to_string(m_biggestValues.at(nBiggest - 1));
325 }
326 return out;
327 }
unsigned size() const
returns the size (in a sense of roughly collected data)
STL namespace.

◆ insert()

void insert ( DataType  newVal)
inline

for convenience reasons, pipe to append.

Definition at line 205 of file MinMaxCollector.h.

206 { append(std::move(newVal));}
void append(DataType newVal)
append new value

◆ merge()

void merge ( const MinMaxCollector< DataType > &  other)
inline

fill the stuff of the other one with this one

Definition at line 236 of file MinMaxCollector.h.

237 {
238 if (other.m_quantileCut != m_quantileCut) {
239 B2WARNING("MinMaxCollector::merge: other collector has differing size in quantileCut. If this is not the purpose, this could indicate unintended behavior!");
240 }
241 if (other.m_quantileCut > m_quantileCut) {
242 m_quantileCut = other.m_quantileCut;
243 }
244 m_smallestValues.insert(m_smallestValues.end(), other.m_smallestValues.begin(), other.m_smallestValues.end());
245 std::sort(m_smallestValues.begin(), m_smallestValues.end());
246 m_biggestValues.insert(m_biggestValues.end(), other.m_biggestValues.begin(), other.m_biggestValues.end());
247 std::sort(m_biggestValues.begin(), m_biggestValues.end());
248 m_sampleSize += other.m_sampleSize;
249 }

◆ print()

void print ( bool  printFull = false) const
inline

print an overview of the entries collected.

if parameter given is 'true' full print will be done, if false, the values will be sketched with means.

Definition at line 284 of file MinMaxCollector.h.

285 { B2INFO(getName(printFull)); }
std::string getName(bool printFull=false) const
return a string of an overview of the entries collected.

◆ push_back()

void push_back ( DataType  newVal)
inline

for convenience reasons, pipe to append.

Definition at line 211 of file MinMaxCollector.h.

212 { append(std::move(newVal));}

◆ sampleSize()

unsigned sampleSize ( ) const
inline

returns actual sampleSize

Definition at line 279 of file MinMaxCollector.h.

279{ return m_sampleSize; }

◆ size()

unsigned size ( ) const
inline

returns the size (in a sense of roughly collected data)

Definition at line 260 of file MinMaxCollector.h.

261 { return m_smallestValues.size() > m_biggestValues.size() ? m_smallestValues.size() : m_biggestValues.size(); }

◆ sortIn()

void sortIn ( DataType  newVal,
bool  isMinContainer 
)
inlineprotected

add newVal to appropriate container

TODO

replace deque with vector and implement individual sort functions for smallest- and biggestValues.

Definition at line 97 of file MinMaxCollector.h.

98 {
103 if (isMinContainer) {
104 m_smallestValues.push_back(newVal);
105 std::sort(m_smallestValues.begin(), m_smallestValues.end());
106 return;
107 }
108 m_biggestValues.push_back(newVal);
109 std::sort(m_biggestValues.begin(), m_biggestValues.end());
110 return;
111 }

◆ totalSize()

unsigned totalSize ( ) const
inline

returns the combined size of the containers storing the values

Definition at line 254 of file MinMaxCollector.h.

255 { return m_smallestValues.size() + m_biggestValues.size(); }

Friends And Related Function Documentation

◆ operator<<

std::ostream & operator<< ( std::ostream &  out,
const MinMaxCollector< DataType > &  mmCol 
)
friend

overloaded '<<' stream operator.

Print secID to stream by converting it to string

Definition at line 177 of file MinMaxCollector.h.

177{ out << mmCol.getName(); return out; }

Member Data Documentation

◆ m_biggestValues

std::deque<DataType> m_biggestValues
protected

collects biggest values occurred so far

Definition at line 48 of file MinMaxCollector.h.

◆ m_quantileCut

DataType m_quantileCut
protected

sets the threshold for storing data.

min- and max-container will store the smallest/biggest quantileCut*100 (== %) of the total sample.

Definition at line 55 of file MinMaxCollector.h.

◆ m_sampleSize

unsigned m_sampleSize
protected

counts number of values added so far

Definition at line 51 of file MinMaxCollector.h.

◆ m_smallestValues

std::deque<DataType> m_smallestValues
protected

collects smallest values occurred so far

Definition at line 45 of file MinMaxCollector.h.

◆ m_warmUpThreshold

unsigned m_warmUpThreshold
protected

sets a threshold for warm-up.

The higher the value, the more accurate the quantiles will be, but the more overhead for small sample sizes will be too.

Definition at line 59 of file MinMaxCollector.h.


The documentation for this class was generated from the following file: