Belle II Software development
Downloader Class Referencefinal

Simple class to encapsulate libcurl as used by the ConditionsDatabase. More...

#include <Downloader.h>

Public Member Functions

 Downloader ()=default
 Create a new payload downloader.
 
 ~Downloader ()
 Destructor.
 
bool startSession ()
 Start a new curl session if none is active at the moment.
 
void finishSession ()
 Finish an existing curl session if any is active at the moment.
 
ScopeGuard ensureSession ()
 Make sure there's an active session and return a ScopeGuard object that closes the session on destruction in case a new session was created;.
 
unsigned int getConnectionTimeout () const
 Get the timeout to wait for connections in seconds, 0 means the built in curl default.
 
unsigned int getStalledTimeout () const
 Get the timeout to wait for stalled connections (<10KB/s), 0 means no timeout.
 
unsigned int getMaxRetries () const
 Get the number of retries to perform when downloading fails with HTTP response code >=500, 0 means no retries.
 
unsigned int getBackoffFactor () const
 Get the backoff factor for retries in seconds.
 
void setConnectionTimeout (unsigned int timeout)
 Set the timeout to wait for connections in seconds, 0 means built in curl default.
 
void setStalledTimeout (unsigned int timeout)
 Set the timeout to wait for stalled connections (<10KB/s), 0 disables timeout.
 
void setMaxRetries (unsigned int retries)
 Set the number of retries to perform when downloading fails with HTTP response code >=500, 0 disables retry.
 
void setBackoffFactor (unsigned int factor)
 Set the backoff factor for retries in seconds.
 
bool download (const std::string &url, std::ostream &stream, bool silentOnMissing=false)
 get an url and save the content to stream This function raises exceptions when there are any problems
 
bool verifyChecksum (std::istream &input, const std::string &checksum)
 check the digest of a stream
 
std::string escapeString (const std::string &text)
 Escape a string to make it safe to be used in web requests.
 
std::string joinWithSlash (const std::string &base, const std::string &second)
 Join two strings and make sure that there is exactly one '/' between them.
 

Static Public Member Functions

static DownloadergetDefaultInstance ()
 Return the default instance.
 

Private Member Functions

void initializeRandomGeneratorSeed ()
 Initialize the seed of the internal random number generator.
 

Static Private Member Functions

static std::string calculateChecksum (std::istream &input)
 calculate the digest/checksum on a given string.
 

Private Attributes

std::unique_ptr< CurlSessionm_session
 curl session handle
 
unsigned int m_connectionTimeout {60}
 Timeout to wait for connections in seconds.
 
unsigned int m_stalledTimeout {60}
 Timeout to wait for stalled connections (<10KB/s)
 
unsigned int m_maxRetries {5}
 Number of retries to perform when downloading fails with HTTP response code >=300.
 
unsigned int m_backoffFactor {3}
 Backoff factor for retries in seconds.
 
std::unique_ptr< std::mt19937 > m_rnd {std::make_unique<std::mt19937>()}
 This is a special exception in basf2 where an instance of gRandom is NOT used: since this class interacts with the Conditions Database, it might alter the state of the random number generator in case of connection troubles, losing our capability to fully reproduce the results.
 
std::unique_ptr< std::uniform_real_distribution< double > > m_rndDistribution {std::make_unique<std::uniform_real_distribution<double>>()}
 A uniform real distribution for extracting random numbers.
 
bool m_rndIsInitialized {false}
 Flag for keeping track if the internal random generator is correctly initialized or not.
 

Static Private Attributes

static bool s_globalInit {false}
 flag to indicate whether curl has been initialized already
 

Detailed Description

Simple class to encapsulate libcurl as used by the ConditionsDatabase.

Definition at line 22 of file Downloader.h.

Constructor & Destructor Documentation

◆ ~Downloader()

~Downloader ( )

Destructor.

Definition at line 138 of file Downloader.cc.

138{ finishSession(); }
void finishSession()
Finish an existing curl session if any is active at the moment.
Definition: Downloader.cc:214

Member Function Documentation

◆ calculateChecksum()

std::string calculateChecksum ( std::istream &  input)
staticprivate

calculate the digest/checksum on a given string.

Parameters
inputinput stream containing the data
Returns
the hex digest of the checksum

Definition at line 224 of file Downloader.cc.

225 {
226 // rewind stream
227 input.clear();
228 input.seekg(0, std::ios::beg);
229 // and calculate md5 checksum by feeding it blockwise to the TMD5 update
230 TMD5 md5;
231 char buffer[4096];
232 while (input.good()) {
233 input.read(buffer, 4096);
234 if (input.gcount() == 0) break;
235 md5.Update((unsigned char*)buffer, input.gcount());
236 }
237 // finalize and return output
238 md5.Final();
239 return md5.AsString();
240 }

◆ download()

bool download ( const std::string &  url,
std::ostream &  stream,
bool  silentOnMissing = false 
)

get an url and save the content to stream This function raises exceptions when there are any problems

Warning
any contents in the stream will be overwritten
Parameters
urlthe url to download
streamthe stream to save the output to
silentOnMissingif true do not emit a warning on 404 Not Found but just return false silently. Useful when checking if a file exists on the server
Returns
true on success, false on any error

Definition at line 258 of file Downloader.cc.

259 {
260 // make sure we have an active curl session ...
261 auto session = ensureSession();
262 // and initialize the internal random number generator
264 B2DEBUG(37, "Download started ..." << LogVar("url", url));
265 // we might need to try a few times in case of HTTP error >= 300
266 for (unsigned int retry{1};; ++retry) {
267 //rewind the stream to the beginning
268 buffer.clear();
269 buffer.seekp(0, std::ios::beg);
270 if (!buffer.good()) {
271 throw std::runtime_error("cannot write to stream");
272 }
273 // Set the exception flags to notify us of any problem during writing
274 auto oldExceptionMask = buffer.exceptions();
275 buffer.exceptions(std::ios::failbit | std::ios::badbit);
276 // build the request ...
277 CURLcode res{CURLE_FAILED_INIT};
278 // and set all the curl options
279 curl_easy_setopt(m_session->curl, CURLOPT_URL, url.c_str());
280 curl_easy_setopt(m_session->curl, CURLOPT_WRITEDATA, &buffer);
281 // perform the request ...
282 res = curl_easy_perform(m_session->curl);
283 // flush output
284 buffer.exceptions(oldExceptionMask);
285 buffer.flush();
286 // and check for errors which occurred during download ...
287 if (res != CURLE_OK) {
288 size_t len = strlen(m_session->errbuf);
289 const std::string error = len ? m_session->errbuf : curl_easy_strerror(res);
290 if (m_maxRetries > 0 && res == CURLE_HTTP_RETURNED_ERROR) {
291 if (retry <= m_maxRetries) {
292 // we treat everything below 300 as permanent error with the request,
293 // while if 300 or above we retry
294 // 404 corresponds to Not Found and we want to treat it differently
295 long responseCode{0};
296 curl_easy_getinfo(m_session->curl, CURLINFO_RESPONSE_CODE, &responseCode);
297 if (responseCode >= 300 and responseCode != 404) {
298 // use exponential backoff but don't restrict to exact slots like
299 // Ethernet, just use a random wait time between 1s and maxDelay =
300 // 2^(retry)-1 * backoffFactor
301 double maxDelay = (std::pow(2, retry) - 1) * m_backoffFactor;
302 // This is an exception in the whole basf2: instead of relying on gRandom for getting a random number,
303 // we rely on a different random number generator, and the reason is:
304 // since the request may fail because of several reasons independent from basf2 (bad connection,
305 // faulty squid cache, etc.), we might retry a new request altering the internal state of the gRandom
306 // instance, spoiling our capability to fully reproduce our results.
307 // In this way, relying on a different generator, we are safe.
308 m_rndDistribution->param(std::uniform_real_distribution<double>::param_type(1.0, maxDelay));
309 double seconds = (*m_rndDistribution)(*m_rnd);
310 B2WARNING("Could not download url, retrying ..."
311 << LogVar("url", url) << LogVar("error", error)
312 << LogVar("try", retry) << LogVar("waiting time", seconds));
313 std::this_thread::sleep_for(std::chrono::milliseconds((int)(seconds * 1e3)));
314 continue;
315 }
316 // special treatment for 404: if silentOnMissing is true we just return false silently
317 // this is useful when checking if a file exists on the server
318 if (responseCode == 404 and silentOnMissing) return false;
319 }
320 }
321 throw std::runtime_error(error);
322 }
323 break;
324 }
325 // all fine
326 B2DEBUG(37, "Download finished successfully." << LogVar("url", url));
327 return true;
328 }
std::unique_ptr< std::uniform_real_distribution< double > > m_rndDistribution
A uniform real distribution for extracting random numbers.
Definition: Downloader.h:121
void initializeRandomGeneratorSeed()
Initialize the seed of the internal random number generator.
Definition: Downloader.cc:330
unsigned int m_maxRetries
Number of retries to perform when downloading fails with HTTP response code >=300.
Definition: Downloader.h:104
std::unique_ptr< CurlSession > m_session
curl session handle
Definition: Downloader.h:96
ScopeGuard ensureSession()
Make sure there's an active session and return a ScopeGuard object that closes the session on destruc...
Definition: Downloader.h:43
unsigned int m_backoffFactor
Backoff factor for retries in seconds.
Definition: Downloader.h:106
Class to store variables with their name which were sent to the logging service.

◆ ensureSession()

ScopeGuard ensureSession ( )
inline

Make sure there's an active session and return a ScopeGuard object that closes the session on destruction in case a new session was created;.

Definition at line 43 of file Downloader.h.

44 {
45 bool started = startSession();
46 return ScopeGuard([this, started] {if (started) finishSession();});
47 }
bool startSession()
Start a new curl session if none is active at the moment.
Definition: Downloader.cc:160

◆ escapeString()

std::string escapeString ( const std::string &  text)

Escape a string to make it safe to be used in web requests.

Definition at line 140 of file Downloader.cc.

141 {
142 //make sure we have an active curl session ...
143 auto session = ensureSession(); // cppcheck-suppress unreadVariable
144 char* escaped = curl_easy_escape(m_session->curl, text.c_str(), text.size());
145 if (!escaped) {
146 throw std::runtime_error("Could not escape string");
147 }
148 std::string escapedStr{escaped};
149 curl_free(escaped);
150 return escapedStr;
151 }

◆ finishSession()

void finishSession ( )

Finish an existing curl session if any is active at the moment.

Definition at line 214 of file Downloader.cc.

215 {
216 // if there's a session clean it ...
217 if (m_session) {
218 curl_easy_cleanup(m_session->curl);
219 curl_slist_free_all(m_session->headers);
220 m_session.reset();
221 }
222 }

◆ getBackoffFactor()

unsigned int getBackoffFactor ( ) const
inline

Get the backoff factor for retries in seconds.

Definition at line 56 of file Downloader.h.

56{ return m_backoffFactor; }

◆ getConnectionTimeout()

unsigned int getConnectionTimeout ( ) const
inline

Get the timeout to wait for connections in seconds, 0 means the built in curl default.

Definition at line 50 of file Downloader.h.

50{ return m_connectionTimeout; }
unsigned int m_connectionTimeout
Timeout to wait for connections in seconds.
Definition: Downloader.h:100

◆ getDefaultInstance()

Downloader & getDefaultInstance ( )
static

Return the default instance.

There can be multiple instances without problem but we provide a default one to allow for better pipelining support

Definition at line 132 of file Downloader.cc.

133 {
134 static Downloader instance;
135 return instance;
136 }
Downloader()=default
Create a new payload downloader.

◆ getMaxRetries()

unsigned int getMaxRetries ( ) const
inline

Get the number of retries to perform when downloading fails with HTTP response code >=500, 0 means no retries.

Definition at line 54 of file Downloader.h.

54{ return m_maxRetries; }

◆ getStalledTimeout()

unsigned int getStalledTimeout ( ) const
inline

Get the timeout to wait for stalled connections (<10KB/s), 0 means no timeout.

Definition at line 52 of file Downloader.h.

52{ return m_stalledTimeout; }
unsigned int m_stalledTimeout
Timeout to wait for stalled connections (<10KB/s)
Definition: Downloader.h:102

◆ initializeRandomGeneratorSeed()

void initializeRandomGeneratorSeed ( )
private

Initialize the seed of the internal random number generator.

Do nothing if the seed is already set (e.g. this method has been already called before). The hash of the basf2 seed is used as seed for m_rnd.

Definition at line 330 of file Downloader.cc.

331 {
332 if (not m_rndIsInitialized) {
333 // We need to provide a seed for m_rnd: let's take the basf2Seed and hash it
334 auto downloaderSeed = std::hash<std::string> {}(RandomNumbers::getSeed());
335 m_rnd->seed(downloaderSeed);
336 m_rndIsInitialized = true;
337 }
338 }
bool m_rndIsInitialized
Flag for keeping track if the internal random generator is correctly initialized or not.
Definition: Downloader.h:123
std::unique_ptr< std::mt19937 > m_rnd
This is a special exception in basf2 where an instance of gRandom is NOT used: since this class inter...
Definition: Downloader.h:119
static std::string getSeed()
Get the random number generator seed.
Definition: RandomNumbers.h:92

◆ joinWithSlash()

std::string joinWithSlash ( const std::string &  base,
const std::string &  second 
)

Join two strings and make sure that there is exactly one '/' between them.

Definition at line 154 of file Downloader.cc.

155 {
156 return boost::trim_right_copy_if(base, boost::is_any_of("/")) + "/" +
157 boost::trim_left_copy_if(rest, boost::is_any_of("/"));
158 }

◆ setBackoffFactor()

void setBackoffFactor ( unsigned int  factor)
inline

Set the backoff factor for retries in seconds.

Minimum is 1 and 0 will be silently converted to 1

Definition at line 64 of file Downloader.h.

64{ m_backoffFactor = std::max(1u, factor); }

◆ setConnectionTimeout()

void setConnectionTimeout ( unsigned int  timeout)

Set the timeout to wait for connections in seconds, 0 means built in curl default.

Definition at line 242 of file Downloader.cc.

243 {
244 m_connectionTimeout = timeout;
245 if (m_session) {
246 curl_easy_setopt(m_session->curl, CURLOPT_CONNECTTIMEOUT, m_connectionTimeout);
247 }
248 }

◆ setMaxRetries()

void setMaxRetries ( unsigned int  retries)
inline

Set the number of retries to perform when downloading fails with HTTP response code >=500, 0 disables retry.

Definition at line 62 of file Downloader.h.

62{ m_maxRetries = retries; }

◆ setStalledTimeout()

void setStalledTimeout ( unsigned int  timeout)

Set the timeout to wait for stalled connections (<10KB/s), 0 disables timeout.

Definition at line 250 of file Downloader.cc.

251 {
252 m_stalledTimeout = timeout;
253 if (m_session) {
254 curl_easy_setopt(m_session->curl, CURLOPT_LOW_SPEED_TIME, m_stalledTimeout);
255 }
256 }

◆ startSession()

bool startSession ( )

Start a new curl session if none is active at the moment.

Returns
true if a new session was started, false if one was active already

Definition at line 160 of file Downloader.cc.

161 {
162 // start a curl session but if there is already one return false
163 if (m_session) return false;
164 // make sure curl is initialized correctly
165 if (!s_globalInit) {
166 curl_global_init(CURL_GLOBAL_ALL);
167 s_globalInit = true;
168 }
169 // create the curl session
170 m_session = std::make_unique<CurlSession>();
171 m_session->curl = curl_easy_init();
172 if (!m_session->curl) {
173 B2FATAL("Cannot initialize libcurl");
174 }
175 m_session->headers = curl_slist_append(nullptr, "Accept: application/json");
176 curl_easy_setopt(m_session->curl, CURLOPT_HTTPHEADER, m_session->headers);
177 curl_easy_setopt(m_session->curl, CURLOPT_TCP_KEEPALIVE, 1L);
178 curl_easy_setopt(m_session->curl, CURLOPT_CONNECTTIMEOUT, m_connectionTimeout);
179 curl_easy_setopt(m_session->curl, CURLOPT_LOW_SPEED_LIMIT, 10 * 1024); //10 kB/s
180 curl_easy_setopt(m_session->curl, CURLOPT_LOW_SPEED_TIME, m_stalledTimeout);
181 curl_easy_setopt(m_session->curl, CURLOPT_WRITEFUNCTION, write_function);
182 curl_easy_setopt(m_session->curl, CURLOPT_VERBOSE, 1);
183 curl_easy_setopt(m_session->curl, CURLOPT_NOPROGRESS, 0);
184 curl_easy_setopt(m_session->curl, CURLOPT_DEBUGFUNCTION, debug_callback);
185 curl_easy_setopt(m_session->curl, CURLOPT_XFERINFOFUNCTION, progress_callback);
186 curl_easy_setopt(m_session->curl, CURLOPT_XFERINFODATA, m_session.get());
187 curl_easy_setopt(m_session->curl, CURLOPT_FAILONERROR, true);
188 curl_easy_setopt(m_session->curl, CURLOPT_ERRORBUFFER, m_session->errbuf);
189 // enable transparent compression support
190 curl_easy_setopt(m_session->curl, CURLOPT_ACCEPT_ENCODING, "");
191 // Set proxy if defined
192 if (EnvironmentVariables::isSet("BELLE2_CONDB_PROXY")) {
193 const std::string proxy = EnvironmentVariables::get("BELLE2_CONDB_PROXY");
194 curl_easy_setopt(m_session->curl, CURLOPT_PROXY, proxy.c_str());
195 }
196 curl_easy_setopt(m_session->curl, CURLOPT_AUTOREFERER, 1L);
197 curl_easy_setopt(m_session->curl, CURLOPT_FOLLOWLOCATION, 1L);
198 curl_easy_setopt(m_session->curl, CURLOPT_MAXREDIRS, 10L);
199 curl_easy_setopt(m_session->curl, CURLOPT_TCP_FASTOPEN, 0L);
200 curl_easy_setopt(m_session->curl, CURLOPT_SSL_VERIFYPEER, 0L);
201 curl_easy_setopt(m_session->curl, CURLOPT_SSL_VERIFYHOST, 0L);
202 curl_easy_setopt(m_session->curl, CURLOPT_SSL_VERIFYSTATUS, 0L);
203 curl_easy_setopt(m_session->curl, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_WHATEVER);
204 // Don't cache DNS entries, ask the system every time we need to connect ...
205 curl_easy_setopt(m_session->curl, CURLOPT_DNS_CACHE_TIMEOUT, 0L);
206 // and shuffle the addresses so we try a different node, otherwise we might
207 // always get the same address due to system caching and RFC 3484
208 curl_easy_setopt(m_session->curl, CURLOPT_DNS_SHUFFLE_ADDRESSES, 1L);
209 auto version = getUserAgent();
210 curl_easy_setopt(m_session->curl, CURLOPT_USERAGENT, version.c_str());
211 return true;
212 }
static bool s_globalInit
flag to indicate whether curl has been initialized already
Definition: Downloader.h:98
static std::string get(const std::string &name, const std::string &fallback="")
Get the value of an environment variable or the given fallback value if the variable is not set.
static bool isSet(const std::string &name)
Check if a value is set in the database.

◆ verifyChecksum()

bool verifyChecksum ( std::istream &  input,
const std::string &  checksum 
)
inline

check the digest of a stream

Parameters
inputstream to check, make sure the stream is in a valid state pointing to the correct position
checksumexpected hash digest of the data
Returns
true if digest matches, false otherwise

Definition at line 81 of file Downloader.h.

81{ return calculateChecksum(input) == checksum; }
static std::string calculateChecksum(std::istream &input)
calculate the digest/checksum on a given string.
Definition: Downloader.cc:224

Member Data Documentation

◆ m_backoffFactor

unsigned int m_backoffFactor {3}
private

Backoff factor for retries in seconds.

Definition at line 106 of file Downloader.h.

◆ m_connectionTimeout

unsigned int m_connectionTimeout {60}
private

Timeout to wait for connections in seconds.

Definition at line 100 of file Downloader.h.

◆ m_maxRetries

unsigned int m_maxRetries {5}
private

Number of retries to perform when downloading fails with HTTP response code >=300.

Definition at line 104 of file Downloader.h.

◆ m_rnd

std::unique_ptr<std::mt19937> m_rnd {std::make_unique<std::mt19937>()}
private

This is a special exception in basf2 where an instance of gRandom is NOT used: since this class interacts with the Conditions Database, it might alter the state of the random number generator in case of connection troubles, losing our capability to fully reproduce the results.

Definition at line 119 of file Downloader.h.

◆ m_rndDistribution

std::unique_ptr<std::uniform_real_distribution<double> > m_rndDistribution {std::make_unique<std::uniform_real_distribution<double>>()}
private

A uniform real distribution for extracting random numbers.

See the docstring for m_rnd as well.

Definition at line 121 of file Downloader.h.

◆ m_rndIsInitialized

bool m_rndIsInitialized {false}
private

Flag for keeping track if the internal random generator is correctly initialized or not.

Definition at line 123 of file Downloader.h.

◆ m_session

std::unique_ptr<CurlSession> m_session
private

curl session handle

Definition at line 96 of file Downloader.h.

◆ m_stalledTimeout

unsigned int m_stalledTimeout {60}
private

Timeout to wait for stalled connections (<10KB/s)

Definition at line 102 of file Downloader.h.

◆ s_globalInit

bool s_globalInit {false}
staticprivate

flag to indicate whether curl has been initialized already

Definition at line 98 of file Downloader.h.


The documentation for this class was generated from the following files: