We have 20+ workers over 50+ pipelines using a variety of scripts and docker images. This is deployed via BOSH on-premises.
Our workers fill up all the time. We are running a ‘bosh recreate’ job every few hours, but this is not optimal.
It appears that baggageclaim is not destroying btrfs sub-volumes even though they’ve expired. Whenever we look at the
<guid>/ttl.json file, the
expires_at timestamp is in the past, yet the volume exists (otherwise, it wouldn’t be there to be viewed).
How can we fix this?
From the stdout.log file for baggageclaim, some volumes are being destroyed, but it is in no way aggressive enough.
Additionally, we see the same volumes deployed multiple times on the same worker node:
worker/105e57ad-8eef-48b9-817d-58660bf14150:/var/vcap/data/baggageclaim/volumes/live# du -sk * | sort -n | tail -n 15
Shouldn’t concourse re-use these containers (i.e the ones that are 197076KB)?