Resetting a worker to a "clean" state

#1

Given a binary install I’m wondering what should be the steps to essentially “reset” a worker. Currently these are the steps we’re following (running concourse as a systemd service):

  1. Retire the worker (and wait until it disappears from the listed workers from fly -t <target> ws)
  2. Stop the systemd service
  3. Remove everything in the worker dir (this includes running umount on all volumes before deleting the files)
  4. Remove all concourse generated network interfaces
  5. Remove all runc files under /var/run/runc/
  6. Start systemd service

This seems to work fairly well however there is one small issue. It seems the space occupied by the volumes isn’t being freed until the machine reboots. df shows almost no difference before and after the volumes are unmounted and deleted. Running du on the affected partition (where the worker dir is) shows correctly that the space is no longer being used). After a machine reboot df shows the “correct” used space again (matching what du reports).

Tried with no success using lsof to figure out what is keeping the file handles (which I assume must be the cause). Anyone have any insight into either the “reset” process or how I can find out why the space isn’t being freed until a reboot happens?

Running 3.10.0 binary install with btrfs using loopback, Debian 8.10 with 4.9.0-0.bpo.3-amd64 kernel

#2

Interesting. I wonder if the space is actually reserved/occupied or whether the kernel would allow it to be reclaimed if something actually needed it? (Basically whether it’s just a reporting anomaly.)

Are there any containers left running before you clear out the work dir? Maybe they’re holding handles open? You may want to kill those on shutdown.

To be honest, manual cleanup is one of the least understood areas of deploying Concourse at the moment. Right now we basically assume everything’s a VM and that people can just nuke it from orbit and make a new one. (It’s the only way to be sure.)

#3

Are there any containers left running before you clear out the work dir? Maybe they’re holding handles open?

AFAIK there aren’t any containers running since they are all sub-process of the systemd service and the service stops correctly (thus terminating all child process), but I’ll check next opportunity and see what is left running before re-starting the concourse worker.

FYI I’ve tried to use lsof | grep deleted and lsof +L1 (after the worker service has stopped and the workdir deleted) but both show nothing.

#4

I’m having the exact same problem since updating to concourse 3.10!

I’m running concourse using docker-compose, host OS is Ubuntu 16.04, running on a physical machine.

I haven’t tried to ignore the output of df and see if I could keep on writing to disk even when df was reporting 100% disk usage. I also couldn’t try to unmount and remount the disk because I have a single volume mounted on /.

As a workaround, I’ve set up a new machine where /var/lib/docker is mounted as a btrfs volume instead of ext4, in the hope that will cause fewer issues. I’ll post some updates in the coming days.

#5

Hmm, this PR may be on to something: https://github.com/concourse/docker-image-resource/pull/179

I wonder if that docker system prune would be just as important for btrfs as zfs.

No space left on device
#6

I can confirm that even when df reports 100% disk usage I can keep on creating new files and Concourse keeps on working, so there is indeed a reporting anomaly, probably due to nesting btrfs inside ext4. It’s annoying because you have apparently no way to track the real disk usage, without rebooting the machine.

On the new machine where I am using btrfs nested inside btrfs things look better, but there’s a different problem: I cannot use docker volume prune to remove old volumes created by a Concourse worker, because the btrfs subvolumes contained inside cannot be deleted unless you run btrfs subvolume delete.

rm -rf /var/lib/docker/volumes/${ID} is also failing, I had to use some bash wizardry:

for v in $(ls /var/lib/docker/volumes/${ID}/_data/volumes/live); do
  btrfs subvolume delete /var/lib/docker/volumes/${ID}/_data/volumes/live/$v/volume
done
#7

In case anyone stumbles on the same issue, this bash script is actually more effective (assuming /var/lib/docker is a btrfs mountpoint):

for v in $(btrfs subvolume list /var/lib/docker | awk '{print $9}'); do
  btrfs subvolume delete /var/lib/docker/$v;
done
#8

Hey, I used your way to clean subvolume on a worker node with 100% for /var/lib/concourse. It looks good and goes back to 2%. But when i rerun the jobs there will be new error message “failed to create volume”. Is it because that I only removed subvolumes not the whole folders?