We are having recently a disturbing issue with one of our concourse installation.
- Concourse runs in k8s eks cluster. using
- We have one web node deployment and stetefulset with workers.
- Once in some time (and this periods tends to be shorter) web node gets recreated and number of containers gets doubled as I can see workers attached to both web nodes at a time.
- Sometimes GC cleans up that mess, but sometimes not and we hit the limit (250 containers) and pipelines fails with
max containers reached
Here is a chart for last 4 hours.
The dramatic falloff in containers number is me rescaling statefulset in k8s.
Please advise what info you will need to help debugging this issue.
Any help appreciated.
p.s. Another installation of concourse in k8s which has 2 web nodes has the same issue, but occurring more rare.