Clean worker shutdown with systemd

I could not find any examples of this out there, so here is what I’ve come up with for running the concourse worker binary with systemd.

In my scenario I’m using an Azure Scale Set configured to auto-scale and I kept ending up with with workers in a stalled state. Using this unit file the workers clean things up before exiting the system.

The key parts are:

# Manage the process with signals
KillMode=process

# On system shutdown (or `systemctl stop`) retire the worker.
KillSignal=SIGUSR2

# On restart just land the worker.
RestartKillSignal=SIGUSR1

# Arbitrarily wait 5 minutes for worker to drain.
TimeoutStopSec=300

The full unit I’m using at the moment (tips/adjustments welcome):

[Unit]
Description=Concourse Worker

[Service]
ExecStart=/usr/local/concourse/bin/concourse worker
User=root
Group=root
Type=simple
WorkingDirectory=/opt/concourse/worker

# These settings were yanked out of the concourse test setup in some github
# repo, they may be wrong for production.
LimitNPROC=infinity
LimitNOFILE=infinity
TasksMax=infinity
MemoryLimit=infinity
Delegate=yes

# They key bits
RestartSec=5
Restart=on-failure
RestartKillSignal=SIGUSR1
KillMode=process
KillSignal=SIGUSR2
TimeoutStopSec=300

# ... other config omitted

[Install]
WantedBy=multi-user.target
1 Like

I’ve never tried managing workers via systemd but I’d assume it would still need to adhere to the worker lifecycle. You probably need to call concourse retire-worker as part of the shutdown otherwise it won’t drain or inform the ATC it’s going away.

My understanding is that the worker will land itself if receives SIGUSR1 and retire itself if it receives SIGUSR2. That said, it probably makes more sense to skip the signals and configure it to run those commands:

KillMode=none
ExecStop=/usr/local/concourse/bin/concourse retire-worker
ExecReload=/usr/local/concourse/bin/concourse land-worker

Learning things here. From the systemd.service docs on ExecStop:

Note that it is usually not sufficient to specify a command for this setting that only asks the service to terminate (for example, by sending some form of termination signal to it), but does not wait for it to do so.

Using the KillMode=signal approach does wait for it to stop.

1 Like