Hello, we deployed in prod a pre-release Concourse 5, with least-build-containers scheduling.
Our prod is quite huge, 30 beefy workers on AWS, compiling C++ code.
The build and test tasks were able to kill any kind of worker, up to the point that we decided to deploy a pre-release to get least-build-containers.
First we got bitten by ‘insufficient subnets remaining in the pool’ https://github.com/concourse/concourse/issues/847
We then allowed for more garden subnets as explained in that ticket.
The number of check containers in the workers now fluctuate completely randomly, reaching 350 containers, but the beefy workers handle that.
Net result: we have 5.0 pre-realase in prod since 24h and we observe GREAT IMPROVEMENTS in the overall load thanks to least-build-containers
Thanks to the Concourse team!
[we will update this post if something newsworthy comes out]
We are using RC54.
We opened a ticket describing current problem: runaway check containers: https://github.com/concourse/concourse/issues/3251
We moved to RC74 and enabled global resources.
We think that the runaway containers happen only if you have lots of pipelines, each pipeline with lots of resources. Proof: we paused a subset of pipelines, the ones with lots of resources (60!) and it seems we managed to stabilize Concourse. The bug is still there, but a “half workaround” is to pause pipelines.
put inputs feature does wonders in reducing useless streaming, flakiness and wasted time! See