K8s worker - no space left on the device

I have a concourse installation in kubernetes using helm.
One of the workers has /concourse-work-dir full. The space is consumed by volumes in /concourse-work-dir/volumes/live
I have read docs about how volume collection works but I couldn’t find any command to execute. Is there any way to force delete volumes or to garbage collection of volumes?

There is no force option, as garbage collection is happening routinely.
If you are running out of disk space:

  • can you increase the allocated disk size?
  • decrease the size the artifacts you are downloading?

This concourse installation was working all weekend without activity of executing pipelines and the filesystem remains full so garbage collection didn’t work.
I have seen the value in helm chart to crete a persistence disk bigger but I think the problem will arise again event if I increase the disk space.

I’m worried because I’m just testing concourse with one nodejs pipeline and a few resources. I think the space is consumed by different nodejs images I used for test and garbage collection is not working fine.

I only have the following error messages that I think all of them are related to out of space and not a problem with garbage collection itself.

2019-09-16T07:17:53.698Z,concourse,devops,baggageclaim.api.volume-server.create-volume-async.create-volume.failed-to-materialize-strategy
2019-09-16T07:17:53.698Z,concourse,devops,baggageclaim.api.volume-server.create-volume-async.failed-to-create
2019-09-16T07:17:53.793Z,concourse,devops,atc.pipelines.radar.scan-resource.interval-runner.tick.find-or-create-cow-volume-for-container.failed-to-create-volume-in-baggageclaim
2019-09-16T07:17:53.797Z,concourse,devops,atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-fetch-image-for-container
2019-09-16T07:17:53.797Z,concourse,devops,atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-create-or-find-container
2019-09-16T07:17:53.799Z,concourse,devops,atc.pipelines.radar.failed-to-run-scan-resource

I’m reading concourse internals but things like caching and retention is not documented yet.

I don’t have any problem to destroy and redeploy with helm again but I would like to know what is happening and how to solve because it could happen again when this environment goes into production.

Is there any way (with fly commands or something “automatically”) to destroy a worker with it’s associated persistence storage and create a new one in a Kubernetes installation?
I have tried to land a worker and then prune it but automatically appears again as available worker without even restart the worker container. So I don’t know how to fully delete a worker with fly commands and create a new one.

Did you ever find a solution without having to redeploy?

No, I didn’t and I stopped testing it. I don’t feel comfortable thinking to use concourse in production environment running on kubernetes