Parallel_groups feature


#1

Hi,

Use Case

a lot of time i am running into the scenario that i have a set of N jobs where only X of them should run in parallel.

As of now i could achieve this type of parallelism by the usage of the serial_groups feature.
I create N/X many serial_groups labels and distribute them among the N jobs equally.

But the outcome of this parallelism is not very flexible.

  1. I can’t easily change the value of Jobs that run in parallel at once without editing the serial_groups labels.
  2. Maybe the Set of jobs that share the serial_group label foo are finished faster than the jobs that share the serial_groups label bar. In this case i am not using my resources efficient as i have to wait until the jobs with label bar finished sequentially.

Suggested solution

My idea was to add another label type parallel_groups which is similar to serial_groups but is not constraining the parallelism by putting Jobs into sequential brackets.

Each parallel_group would define an integer value max_in_flight_jobs. All Jobs that are labeled with the same parallel_groups label, run in parallel and are only constrained by the maximal number of parallel jobs defined by max_in_flight_jobs.

This would guarantee that the all jobs in parallel_groups bracket would be executed in parallel as efficient as possible respecting the max_in_flight_jobs constraint.

If there is an interest i would like to participate in a PR.

best D