Garbage Collection not working? Consistently


#1

We have 20+ workers over 50+ pipelines using a variety of scripts and docker images. This is deployed via BOSH on-premises.

Our workers fill up all the time. We are running a ‘bosh recreate’ job every few hours, but this is not optimal.

It appears that baggageclaim is not destroying btrfs sub-volumes even though they’ve expired. Whenever we look at the <guid>/ttl.json file, the expires_at timestamp is in the past, yet the volume exists (otherwise, it wouldn’t be there to be viewed).

How can we fix this?

From the stdout.log file for baggageclaim, some volumes are being destroyed, but it is in no way aggressive enough.

Additionally, we see the same volumes deployed multiple times on the same worker node:
worker/105e57ad-8eef-48b9-817d-58660bf14150:/var/vcap/data/baggageclaim/volumes/live# du -sk * | sort -n | tail -n 15
280 7ab5593c-8d8e-450b-775a-536f39864388
360 53ecf4f8-e57a-485a-6b49-89c07c9e670b
360 8a6bdd3d-b7da-4426-60e2-206b053281bd
388 981a142a-6c57-4780-41a7-01721fb88a99
65848 9ab137de-2b09-4e22-5a93-efc76ae0965c
66296 6864ab1a-659b-40e3-6823-8bf66423a56b
67244 45781f53-6984-4ad9-5041-50a1fe4468be
159928 14a08525-7e85-4c82-5d24-4d623a15ba1b
190076 3b683929-b479-49c5-6c8e-30547f55d7f0
197076 a390e256-03c1-4c1b-4489-2292b2842aea
197076 d5aabf5c-608a-4799-4627-c6baf82e7f3f
197108 655a8483-f66a-4d98-6e65-fde4156ff63b
234800 47d939cd-7cde-4608-48d8-b5f51e208362
9023124 92a266ce-3408-485d-4301-ad9706e99b25
103118404 22a37603-ae8b-4c33-41d8-e978744213e7

Shouldn’t concourse re-use these containers (i.e the ones that are 197076KB)?