Belle II Software light-2406-ragdoll
Downloader Class Referencefinal

Simple class to encapsulate libcurl as used by the ConditionsDatabase. More...

#include <Downloader.h>

Collaboration diagram for Downloader:

Public Member Functions

 Downloader ()=default
 Create a new payload downloader.
 
 ~Downloader ()
 Destructor.
 
bool startSession ()
 Start a new curl session if none is active at the moment.
 
void finishSession ()
 Finish an existing curl session if any is active at the moment.
 
ScopeGuard ensureSession ()
 Make sure there's an active session and return a ScopeGuard object that closes the session on destruction in case a new session was created;.
 
unsigned int getConnectionTimeout () const
 Get the timeout to wait for connections in seconds, 0 means the built in curl default.
 
unsigned int getStalledTimeout () const
 Get the timeout to wait for stalled connections (<10KB/s), 0 means no timeout.
 
unsigned int getMaxRetries () const
 Get the number of retries to perform when downloading fails with HTTP response code >=500, 0 means no retries.
 
unsigned int getBackoffFactor () const
 Get the backoff factor for retries in seconds.
 
void setConnectionTimeout (unsigned int timeout)
 Set the timeout to wait for connections in seconds, 0 means built in curl default.
 
void setStalledTimeout (unsigned int timeout)
 Set the timeout to wait for stalled connections (<10KB/s), 0 disables timeout.
 
void setMaxRetries (unsigned int retries)
 Set the number of retries to perform when downloading fails with HTTP response code >=500, 0 disables retry.
 
void setBackoffFactor (unsigned int factor)
 Set the backoff factor for retries in seconds.
 
bool download (const std::string &url, std::ostream &stream, bool silentOnMissing=false)
 get an url and save the content to stream This function raises exceptions when there are any problems
 
bool verifyChecksum (std::istream &input, const std::string &checksum)
 check the digest of a stream
 
std::string escapeString (const std::string &text)
 Escape a string to make it safe to be used in web requests.
 
std::string joinWithSlash (const std::string &base, const std::string &second)
 Join two strings and make sure that there is exactly one '/' between them.
 

Static Public Member Functions

static DownloadergetDefaultInstance ()
 Return the default instance.
 

Private Member Functions

void initializeRandomGeneratorSeed ()
 Initialize the seed of the internal random number generator.
 

Static Private Member Functions

static std::string calculateChecksum (std::istream &input)
 calculate the digest/checksum on a given string.
 

Private Attributes

std::unique_ptr< CurlSessionm_session
 curl session handle
 
unsigned int m_connectionTimeout {60}
 Timeout to wait for connections in seconds.
 
unsigned int m_stalledTimeout {60}
 Timeout to wait for stalled connections (<10KB/s)
 
unsigned int m_maxRetries {5}
 Number of retries to perform when downloading fails with HTTP response code >=300.
 
unsigned int m_backoffFactor {3}
 Backoff factor for retries in seconds.
 
std::unique_ptr< std::mt19937 > m_rnd {std::make_unique<std::mt19937>()}
 This is a special exception in basf2 where an instance of gRandom is NOT used: since this class interacts with the Conditions Database, it might alter the state of the random number generator in case of connection troubles, loosing our capability to fully reproduce the results.
 
std::unique_ptr< std::uniform_real_distribution< double > > m_rndDistribution {std::make_unique<std::uniform_real_distribution<double>>()}
 A uniform real distribution for extracting random numbers.
 
bool m_rndIsInitialized {false}
 Flag for keeping track if the internal random generator is correctly initialized or not.
 

Static Private Attributes

static bool s_globalInit {false}
 flag to indicate whether curl has been initialized already
 

Detailed Description

Simple class to encapsulate libcurl as used by the ConditionsDatabase.

Definition at line 22 of file Downloader.h.

Constructor & Destructor Documentation

◆ ~Downloader()

~Downloader ( )

Destructor.

Definition at line 140 of file Downloader.cc.

140{ finishSession(); }
void finishSession()
Finish an existing curl session if any is active at the moment.
Definition: Downloader.cc:216

Member Function Documentation

◆ calculateChecksum()

std::string calculateChecksum ( std::istream &  input)
staticprivate

calculate the digest/checksum on a given string.

Parameters
inputinput stream containing the data
Returns
the hex digest of the checksum

Definition at line 226 of file Downloader.cc.

227 {
228 // rewind stream
229 input.clear();
230 input.seekg(0, std::ios::beg);
231 // and calculate md5 checksum by feeding it blockwise to the TMD5 update
232 TMD5 md5;
233 char buffer[4096];
234 while (input.good()) {
235 input.read(buffer, 4096);
236 if (input.gcount() == 0) break;
237 md5.Update((unsigned char*)buffer, input.gcount());
238 }
239 // finalize and return output
240 md5.Final();
241 return md5.AsString();
242 }

◆ download()

bool download ( const std::string &  url,
std::ostream &  stream,
bool  silentOnMissing = false 
)

get an url and save the content to stream This function raises exceptions when there are any problems

Warning
any contents in the stream will be overwritten
Parameters
urlthe url to download
streamthe stream to save the output to
silentOnMissingif true do not emit a warning on 404 Not Found but just return false silently. Useful when checking if a file exists on the server
Returns
true on success, false on any error

Definition at line 260 of file Downloader.cc.

261 {
262 // make sure we have an active curl session ...
263 auto session = ensureSession();
264 // and initialize the internal random number generator
266 B2DEBUG(37, "Download started ..." << LogVar("url", url));
267 // we might need to try a few times in case of HTTP error >= 300
268 for (unsigned int retry{1};; ++retry) {
269 //rewind the stream to the beginning
270 buffer.clear();
271 buffer.seekp(0, std::ios::beg);
272 if (!buffer.good()) {
273 throw std::runtime_error("cannot write to stream");
274 }
275 // Set the exception flags to notify us of any problem during writing
276 auto oldExceptionMask = buffer.exceptions();
277 buffer.exceptions(std::ios::failbit | std::ios::badbit);
278 // build the request ...
279 CURLcode res{CURLE_FAILED_INIT};
280 // and set all the curl options
281 curl_easy_setopt(m_session->curl, CURLOPT_URL, url.c_str());
282 curl_easy_setopt(m_session->curl, CURLOPT_WRITEDATA, &buffer);
283 // perform the request ...
284 res = curl_easy_perform(m_session->curl);
285 // flush output
286 buffer.exceptions(oldExceptionMask);
287 buffer.flush();
288 // and check for errors which occurred during download ...
289 if (res != CURLE_OK) {
290 size_t len = strlen(m_session->errbuf);
291 const std::string error = len ? m_session->errbuf : curl_easy_strerror(res);
292 if (m_maxRetries > 0 && res == CURLE_HTTP_RETURNED_ERROR) {
293 if (retry <= m_maxRetries) {
294 // we treat everything below 300 as permanent error with the request,
295 // while if 300 or above we retry
296 // 404 corresponds to Not Found and we want to treat it differently
297 long responseCode{0};
298 curl_easy_getinfo(m_session->curl, CURLINFO_RESPONSE_CODE, &responseCode);
299 if (responseCode >= 300 and responseCode != 404) {
300 // use exponential backoff but don't restrict to exact slots like
301 // Ethernet, just use a random wait time between 1s and maxDelay =
302 // 2^(retry)-1 * backoffFactor
303 double maxDelay = (std::pow(2, retry) - 1) * m_backoffFactor;
304 // This is an exception in the whole basf2: instead of relying on gRandom for getting a random number,
305 // we rely on a different random number generator, and the reason is:
306 // since the request may fail because of several reasons independent from basf2 (bad connection,
307 // faulty squid cache, etc.), we might retry a new request altering the internal state of the gRandom
308 // instance, spoiling our capability to fully reproduce our results.
309 // In this way, relying on a different generator, we are safe.
310 m_rndDistribution->param(std::uniform_real_distribution<double>::param_type(1.0, maxDelay));
311 double seconds = (*m_rndDistribution)(*m_rnd);
312 B2WARNING("Could not download url, retrying ..."
313 << LogVar("url", url) << LogVar("error", error)
314 << LogVar("try", retry) << LogVar("waiting time", seconds));
315 std::this_thread::sleep_for(std::chrono::milliseconds((int)(seconds * 1e3)));
316 continue;
317 }
318 // special treatment for 404: if silentOnMissing is true we just return false silently
319 // this is useful when checking if a file exists on the server
320 if (responseCode == 404 and silentOnMissing) return false;
321 }
322 }
323 throw std::runtime_error(error);
324 }
325 break;
326 }
327 // all fine
328 B2DEBUG(37, "Download finished successfully." << LogVar("url", url));
329 return true;
330 }
std::unique_ptr< std::uniform_real_distribution< double > > m_rndDistribution
A uniform real distribution for extracting random numbers.
Definition: Downloader.h:121
void initializeRandomGeneratorSeed()
Initialize the seed of the internal random number generator.
Definition: Downloader.cc:332
unsigned int m_maxRetries
Number of retries to perform when downloading fails with HTTP response code >=300.
Definition: Downloader.h:104
std::unique_ptr< CurlSession > m_session
curl session handle
Definition: Downloader.h:96
ScopeGuard ensureSession()
Make sure there's an active session and return a ScopeGuard object that closes the session on destruc...
Definition: Downloader.h:43
unsigned int m_backoffFactor
Backoff factor for retries in seconds.
Definition: Downloader.h:106
Class to store variables with their name which were sent to the logging service.

◆ ensureSession()

ScopeGuard ensureSession ( )
inline

Make sure there's an active session and return a ScopeGuard object that closes the session on destruction in case a new session was created;.

Definition at line 43 of file Downloader.h.

44 {
45 bool started = startSession();
46 return ScopeGuard([this, started] {if (started) finishSession();});
47 }
bool startSession()
Start a new curl session if none is active at the moment.
Definition: Downloader.cc:162

◆ escapeString()

std::string escapeString ( const std::string &  text)

Escape a string to make it safe to be used in web requests.

Definition at line 142 of file Downloader.cc.

143 {
144 //make sure we have an active curl session ...
145 auto session = ensureSession(); // cppcheck-suppress unreadVariable
146 char* escaped = curl_easy_escape(m_session->curl, text.c_str(), text.size());
147 if (!escaped) {
148 throw std::runtime_error("Could not escape string");
149 }
150 std::string escapedStr{escaped};
151 curl_free(escaped);
152 return escapedStr;
153 }

◆ finishSession()

void finishSession ( )

Finish an existing curl session if any is active at the moment.

Definition at line 216 of file Downloader.cc.

217 {
218 // if there's a session clean it ...
219 if (m_session) {
220 curl_easy_cleanup(m_session->curl);
221 curl_slist_free_all(m_session->headers);
222 m_session.reset();
223 }
224 }

◆ getBackoffFactor()

unsigned int getBackoffFactor ( ) const
inline

Get the backoff factor for retries in seconds.

Definition at line 56 of file Downloader.h.

56{ return m_backoffFactor; }

◆ getConnectionTimeout()

unsigned int getConnectionTimeout ( ) const
inline

Get the timeout to wait for connections in seconds, 0 means the built in curl default.

Definition at line 50 of file Downloader.h.

50{ return m_connectionTimeout; }
unsigned int m_connectionTimeout
Timeout to wait for connections in seconds.
Definition: Downloader.h:100

◆ getDefaultInstance()

Downloader & getDefaultInstance ( )
static

Return the default instance.

There can be multiple instances without problem but we provide a default one to allow for better pipelining support

Definition at line 134 of file Downloader.cc.

135 {
136 static Downloader instance;
137 return instance;
138 }
Downloader()=default
Create a new payload downloader.

◆ getMaxRetries()

unsigned int getMaxRetries ( ) const
inline

Get the number of retries to perform when downloading fails with HTTP response code >=500, 0 means no retries.

Definition at line 54 of file Downloader.h.

54{ return m_maxRetries; }

◆ getStalledTimeout()

unsigned int getStalledTimeout ( ) const
inline

Get the timeout to wait for stalled connections (<10KB/s), 0 means no timeout.

Definition at line 52 of file Downloader.h.

52{ return m_stalledTimeout; }
unsigned int m_stalledTimeout
Timeout to wait for stalled connections (<10KB/s)
Definition: Downloader.h:102

◆ initializeRandomGeneratorSeed()

void initializeRandomGeneratorSeed ( )
private

Initialize the seed of the internal random number generator.

Do nothing if the seed is already set (e.g. this method has been already called before). The hash of the basf2 seed is used as seed for m_rnd.

Definition at line 332 of file Downloader.cc.

333 {
334 if (not m_rndIsInitialized) {
335 // We need to provide a seed for m_rnd: let's take the basf2Seed and hash it
336 auto downloaderSeed = std::hash<std::string> {}(RandomNumbers::getSeed());
337 m_rnd->seed(downloaderSeed);
338 m_rndIsInitialized = true;
339 }
340 }
bool m_rndIsInitialized
Flag for keeping track if the internal random generator is correctly initialized or not.
Definition: Downloader.h:123
std::unique_ptr< std::mt19937 > m_rnd
This is a special exception in basf2 where an instance of gRandom is NOT used: since this class inter...
Definition: Downloader.h:119
static std::string getSeed()
Get the random number generator seed.
Definition: RandomNumbers.h:92

◆ joinWithSlash()

std::string joinWithSlash ( const std::string &  base,
const std::string &  second 
)

Join two strings and make sure that there is exactly one '/' between them.

Definition at line 156 of file Downloader.cc.

157 {
158 return boost::trim_right_copy_if(base, boost::is_any_of("/")) + "/" +
159 boost::trim_left_copy_if(rest, boost::is_any_of("/"));
160 }

◆ setBackoffFactor()

void setBackoffFactor ( unsigned int  factor)
inline

Set the backoff factor for retries in seconds.

Minimum is 1 and 0 will be silently converted to 1

Definition at line 64 of file Downloader.h.

64{ m_backoffFactor = std::max(1u, factor); }

◆ setConnectionTimeout()

void setConnectionTimeout ( unsigned int  timeout)

Set the timeout to wait for connections in seconds, 0 means built in curl default.

Definition at line 244 of file Downloader.cc.

245 {
246 m_connectionTimeout = timeout;
247 if (m_session) {
248 curl_easy_setopt(m_session->curl, CURLOPT_CONNECTTIMEOUT, m_connectionTimeout);
249 }
250 }

◆ setMaxRetries()

void setMaxRetries ( unsigned int  retries)
inline

Set the number of retries to perform when downloading fails with HTTP response code >=500, 0 disables retry.

Definition at line 62 of file Downloader.h.

62{ m_maxRetries = retries; }

◆ setStalledTimeout()

void setStalledTimeout ( unsigned int  timeout)

Set the timeout to wait for stalled connections (<10KB/s), 0 disables timeout.

Definition at line 252 of file Downloader.cc.

253 {
254 m_stalledTimeout = timeout;
255 if (m_session) {
256 curl_easy_setopt(m_session->curl, CURLOPT_LOW_SPEED_TIME, m_stalledTimeout);
257 }
258 }

◆ startSession()

bool startSession ( )

Start a new curl session if none is active at the moment.

Returns
true if a new session was started, false if one was active already

Definition at line 162 of file Downloader.cc.

163 {
164 // start a curl session but if there is already one return false
165 if (m_session) return false;
166 // make sure curl is initialized correctly
167 if (!s_globalInit) {
168 curl_global_init(CURL_GLOBAL_ALL);
169 s_globalInit = true;
170 }
171 // create the curl session
172 m_session = std::make_unique<CurlSession>();
173 m_session->curl = curl_easy_init();
174 if (!m_session->curl) {
175 B2FATAL("Cannot initialize libcurl");
176 }
177 m_session->headers = curl_slist_append(nullptr, "Accept: application/json");
178 curl_easy_setopt(m_session->curl, CURLOPT_HTTPHEADER, m_session->headers);
179 curl_easy_setopt(m_session->curl, CURLOPT_TCP_KEEPALIVE, 1L);
180 curl_easy_setopt(m_session->curl, CURLOPT_CONNECTTIMEOUT, m_connectionTimeout);
181 curl_easy_setopt(m_session->curl, CURLOPT_LOW_SPEED_LIMIT, 10 * 1024); //10 kB/s
182 curl_easy_setopt(m_session->curl, CURLOPT_LOW_SPEED_TIME, m_stalledTimeout);
183 curl_easy_setopt(m_session->curl, CURLOPT_WRITEFUNCTION, write_function);
184 curl_easy_setopt(m_session->curl, CURLOPT_VERBOSE, 1);
185 curl_easy_setopt(m_session->curl, CURLOPT_NOPROGRESS, 0);
186 curl_easy_setopt(m_session->curl, CURLOPT_DEBUGFUNCTION, debug_callback);
187 curl_easy_setopt(m_session->curl, CURLOPT_XFERINFOFUNCTION, progress_callback);
188 curl_easy_setopt(m_session->curl, CURLOPT_XFERINFODATA, m_session.get());
189 curl_easy_setopt(m_session->curl, CURLOPT_FAILONERROR, true);
190 curl_easy_setopt(m_session->curl, CURLOPT_ERRORBUFFER, m_session->errbuf);
191 // enable transparent compression support
192 curl_easy_setopt(m_session->curl, CURLOPT_ACCEPT_ENCODING, "");
193 // Set proxy if defined
194 if (EnvironmentVariables::isSet("BELLE2_CONDB_PROXY")) {
195 const std::string proxy = EnvironmentVariables::get("BELLE2_CONDB_PROXY");
196 curl_easy_setopt(m_session->curl, CURLOPT_PROXY, proxy.c_str());
197 }
198 curl_easy_setopt(m_session->curl, CURLOPT_AUTOREFERER, 1L);
199 curl_easy_setopt(m_session->curl, CURLOPT_FOLLOWLOCATION, 1L);
200 curl_easy_setopt(m_session->curl, CURLOPT_MAXREDIRS, 10L);
201 curl_easy_setopt(m_session->curl, CURLOPT_TCP_FASTOPEN, 0L);
202 curl_easy_setopt(m_session->curl, CURLOPT_SSL_VERIFYPEER, 0L);
203 curl_easy_setopt(m_session->curl, CURLOPT_SSL_VERIFYHOST, 0L);
204 curl_easy_setopt(m_session->curl, CURLOPT_SSL_VERIFYSTATUS, 0L);
205 curl_easy_setopt(m_session->curl, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_WHATEVER);
206 // Don't cache DNS entries, ask the system every time we need to connect ...
207 curl_easy_setopt(m_session->curl, CURLOPT_DNS_CACHE_TIMEOUT, 0L);
208 // and shuffle the addresses so we try a different node, otherwise we might
209 // always get the same address due to system caching and RFC 3484
210 curl_easy_setopt(m_session->curl, CURLOPT_DNS_SHUFFLE_ADDRESSES, 1L);
211 auto version = getUserAgent();
212 curl_easy_setopt(m_session->curl, CURLOPT_USERAGENT, version.c_str());
213 return true;
214 }
static bool s_globalInit
flag to indicate whether curl has been initialized already
Definition: Downloader.h:98
static std::string get(const std::string &name, const std::string &fallback="")
Get the value of an environment variable or the given fallback value if the variable is not set.
static bool isSet(const std::string &name)
Check if a value is set in the database.

◆ verifyChecksum()

bool verifyChecksum ( std::istream &  input,
const std::string &  checksum 
)
inline

check the digest of a stream

Parameters
inputstream to check, make sure the stream is in a valid state pointing to the correct position
checksumexpected hash digest of the data
Returns
true if digest matches, false otherwise

Definition at line 81 of file Downloader.h.

81{ return calculateChecksum(input) == checksum; }
static std::string calculateChecksum(std::istream &input)
calculate the digest/checksum on a given string.
Definition: Downloader.cc:226

Member Data Documentation

◆ m_backoffFactor

unsigned int m_backoffFactor {3}
private

Backoff factor for retries in seconds.

Definition at line 106 of file Downloader.h.

◆ m_connectionTimeout

unsigned int m_connectionTimeout {60}
private

Timeout to wait for connections in seconds.

Definition at line 100 of file Downloader.h.

◆ m_maxRetries

unsigned int m_maxRetries {5}
private

Number of retries to perform when downloading fails with HTTP response code >=300.

Definition at line 104 of file Downloader.h.

◆ m_rnd

std::unique_ptr<std::mt19937> m_rnd {std::make_unique<std::mt19937>()}
private

This is a special exception in basf2 where an instance of gRandom is NOT used: since this class interacts with the Conditions Database, it might alter the state of the random number generator in case of connection troubles, loosing our capability to fully reproduce the results.

Definition at line 119 of file Downloader.h.

◆ m_rndDistribution

std::unique_ptr<std::uniform_real_distribution<double> > m_rndDistribution {std::make_unique<std::uniform_real_distribution<double>>()}
private

A uniform real distribution for extracting random numbers.

See the docstring for m_rnd as well.

Definition at line 121 of file Downloader.h.

◆ m_rndIsInitialized

bool m_rndIsInitialized {false}
private

Flag for keeping track if the internal random generator is correctly initialized or not.

Definition at line 123 of file Downloader.h.

◆ m_session

std::unique_ptr<CurlSession> m_session
private

curl session handle

Definition at line 96 of file Downloader.h.

◆ m_stalledTimeout

unsigned int m_stalledTimeout {60}
private

Timeout to wait for stalled connections (<10KB/s)

Definition at line 102 of file Downloader.h.

◆ s_globalInit

bool s_globalInit {false}
staticprivate

flag to indicate whether curl has been initialized already

Definition at line 98 of file Downloader.h.


The documentation for this class was generated from the following files: