We have a bunch of pipelines that have been migrated from an other CI system were everything was required to run via docker-compose, thus we end up with a lot of dcind jobs.
For most part this is fine, until it isnt, and after the job is marked as successful/failed in the UI the container is still lingering around on the worker. This happens up to 20-30 times a day for us.
Currently we are running the following on all our workers every 30 mins
for PROCESS in `pgrep garden-init`; do AGE_IN_SECONDS=`expr $(date +%s) - $(stat --format=%Y /proc/$PROCESS)` # all our pipelines have all their jobs set to a timeout of 1h, check for containers that have lived >65 mins to give Concourse some time to do its magic if (( AGE_IN_SECONDS > 3900)); then if [[ `sudo strings /proc/$PROCESS/environ` != *"ATC_EXTERNAL_URL"* ]]; then # we dont care about check/get/put container, only task containers echo "$PROCESS has been alive for $AGE_IN_SECONDS seconds" fi fi done
Checking out a $PROCESS we sometimes see the following
root 2892829 0.0 0.0 1120 0 ? Ss Nov20 0:00 /tmp/garden-init
root 2892829 0.0 0.0 1120 0 ? Ss Nov20 0:00 /tmp/garden-init root 2893142 0.2 0.0 1256120 24292 ? Sl Nov20 11:50 \_ dockerd --data-root /scratch/docker root 2893154 0.1 0.0 884244 8508 ? Ssl Nov20 8:26 \_ containerd --config /var/run/docker/containerd/containerd.toml --log-level info # With or without some random process under the containerd
This is most likely something wrong on our end with our dcind container, which besides of spinning up docker does other nasty stuff such as mounting a NFS volume for shared caching across workers, and we are currently trying to catch and fix all errors as we find them.
But what is the correct way to get rid of these old shabby containers as they appear? Sometimes we find(especially when people are running MongoDB, MarkLogic, Graphite (yep, that is unfortunately a thing…O_o)) that these orphanage containers consume a-lot of resources and in extreme cases causes the workers to go into a sad sad state.
kill -9 $PROCESS obviously works, but it leaves the entry in the
containers table in the db. Dont know if this is or isnt an issue?