Full Workflow Execution

Full Workflow Execution#

We execute the dependency tree for a target task by calling b2luigi.process(Task(parameters=...),workers=nworkers). b2luigi will run a maximum number of <nworkers> tasks in parallel, whenever possible.

It is best practice to include a __main__ method in the scripts:

Listing 3.6 main.py#
# @cond
import b2luigi as luigi
from offlineanalysis import Plot
if __name__ == "__main__":
    output_directory = "/group/belle/users/<user>"
    luigi.set_setting("result_dir", output_directory)
    luigi.process(Plot(), workers=100)
# @endcond

Calling python3 main.py --batch on KEKcc will the trigger the full workflow execution. b2luigi will build the dependency tree for the Plot task and execute only the required tasks for which no output files are existing in the given output directory. Do not forget to adjust output_directory and to setup basf2 beforehand, for the recommended release use b2setup $(b2help-releases). Remember that the reconstruction task is the only task not marked as local and will therefore be submitted to the KEKcc batch system.

You can run b2luigi workflows dryly with python3 main.py --dry-run to check what tasks would be run.

Luigi features a dynamic directed acyclic graph, that can be viewed in the Luigi Task Status. To access it, start the luigi scheduler in a tmux process on KEKcc and specify the host and port in the workflow execution:

tmux #open a new tmux session
source /cvmfs/belle.cern.ch/tools/b2setup <release> #setup basf2
~/.local/bin/luigid --port <ssh port> #start the luigi scheduler
Ctrl + b + d #detach the tmux session

source /cvmfs/belle.cern.ch/tools/b2setup <release> #setup basf2
python3 main.py --batch --scheduler-host localhost --scheduler-port <ssh port> #start workflow
firefox localhost:<ssh port> #view scheduler on your local machine