Building programs usually takes a lot of memory and computation power. Thus, the huge task is usually split in small ones, and the chunks are assembled together during the link phase. Now for computers that have several processing units or for compilation farms (networks of machines), it is desirable to run several tasks at once to balance the load and obtain faster builds.
Like for Scons, waf uses a producer-consumer scheme: several threads are waiting for new tasks to launch. The scheduling strategy is also similar, tasks that may run first are actually run first.
One problem that arises when many tasks are available is that the dependency tree becomes quickly complicated. The dependency tree is not entirely known when starting (some tasks may or may not need to be run, depending on the tasks they depend on), so it is not possible to sort it before actually building the project. For example, the build tool should know that link tasks must be run after all objects are built, but it cannot infer this easily from the tasks and the tree alone. Another example is the case of a compiler which compiled and used afterwards for compiling other objects.
Another problem is that some tasks may require too much resources and need to run alone. In particular, when link tasks are run in parallel, they may eat all the ram available, and once the processes reach the swap the overall build times are greatly reduced.
To reduce the complexity of the dependency tree, waf groups tasks by priorities: for example the compiler needs to be compiled before any file can be transformed. Tasks with a lower priority id will run first, and before any other task with a higher priority id. The tasks that have an odd priority id are run one by one, while tasks with an even priority id are run in parallel. To give a concrete example, tasks that produce .cpp files will have their own group and no .cpp compilation will occur until all the tasks if this group are all done.
The following pictures show how many process are running at a given moment for link tasks run in parallel and link tasks run in sequence (first and second picture) on a small project compiled with 5 threads. The test project has a few documentation and translation files, about 50 cpp files, produces 10 libraries and requires 131 tasks to complete (kdissert).
The curves show three distinct zones which correspond to the processing of three different groups of tasks. The first group has documentation and translation tasks (must run first), the second group has compilation tasks while the third one has only link tasks (must run after all compilation tasks are finished).
The swap was not used (too few processes, single cpu system with a lot of ram), so the parallelization have had no influence over the build times in these tests. The parallelization system makes it possible to check for these bottlenecks though.