Nginx load balancer - workers not pickup up builds

Hello,

I have been pulling my hair out trying to get this to work :wink:
My goal is to have:

concourse 4.2.1

  • load balancer (LB) in front of two web nodes
  • with the workers interfacing through the LB
  • dedicated database node

The worker will register through the LB and I can see the worker as running through the fly workers command.

When i kick off a build (simple hello world pipeline) the pipeline goes yellow -> to started and remains there.

on the web node i see:

`{{"timestamp":"1568829776.851881981","source":"tsa","message":"tsa.connection.channel.sweep-containers-worker.finished-writing-sweeper-containers","log_level":1,"data":{"bytes-written":3,"err":null,"remote":"192.168.0.1:44750","session":"58.1.1"}}
"timestamp":"1568829798.563172340","source":"atc","message":"atc.create-job-build.trigger-immediately.try-start-next-pending-build.image.find-or-create-volume-for-base-resource-type.create-volume.retry-round-tripper.retrying","log_level":1,"data":{"build-id":1,"build-name":"1","container":"2f993b4a-9aa5-4da9-56f7-d6907f845ae4","error":"dial tcp 127.0.0.1:41337: connect: connection refused","failed-attempts":58,"job_name":"job-hello-world","ran-for":"14m23.19565515s","session":"145.1.1.2.2.2.1","volume":"514475ad-64a3-49bd-4f42-2ab4cf87b52e"}}`

environment is:

  • LB, 192.168.0.1

  • web01 192.168.0.2

  • web02 192.168.0.3

  • db 192.168.0.4

  • worker 192.168.0.5

  • LB redirects 80->443 and also does TCP port 2222. nginx then forwards 2222 directly to upstream server. once client connects over 443 it is then redirected to http://$upstream to either web01/02

  • worker set --tsa-host=<FQDN_of_LB>:2222

worker config:

export CONCOURSE_BAGGAGECLAIM_DRIVER=btrfs
concourse worker --work-dir /opt/concourse/worker --name ${1} --tsa-host=<FQDN_of_LB>:2222 --tsa-public-key /etc/concourse/keys/worker/tsa_host_key.pub --tsa-worker-private-key /etc/concourse/keys/worker/worker_key

for each web node:

web01 uses these vars of interest

CONCOURSE_EXTERNAL_URL: https://<FQDN_of_LB>

web02 uses these vars of interest

CONCOURSE_EXTERNAL_URL: https://<FQDN_of_LB>

both web docker compose files have this stanza for ports:
ports: [“80:8080”, “2222:2222”]

one thing of interest, i kicked off, as a test, 5 builds that all went to started (yellow) and sat there. once i finished typing this up, about 20 minutes, i then brought down web02 and in seconds they all came to life and started to run and now all are green.

Any thoughts if i may have this configured incorrectly?

I have my tcp balancing for port 2222, in nginx, configured like so:

stream {
upstream web_server {
    # Our web server, listening for SSL traffic
    # Note the web server will expect traffic
    # at this xip.io "domain", just for our
    # example here
    server 192.168.0.2:2222;
    server 192.168.0.3:2222;
}

server {
    listen 2222;
    proxy_pass web_server;
}
}

Thanks for any help you can provide
Gus