Workers stall on execute, cannot access docker registry (maybe DNS?)


#1

I’ve deployed concourse via a terraform module we wrote in a single web/worker node for testing. You can find it here: https://ci.7fdev.io.

It’s concourse 3.14.1, and things appear to work until I execute a build using fly. I get the dreaded “Unable to connect to the docker registry” issue. I suspect it might be networking problems in AWS, but I need some help confirming. I’m running web/worker like this:

sudo docker run -d --network host --name concourse_worker --privileged=true --restart=unless-stopped -v /etc/concourse/keys/:/concourse-keys -v /tmp/:/concourse-tmp concourse/concourse:3.14.1 worker --peer-ip 172.0.2.40 --bind-ip 0.0.0.0 --garden-dns-server 8.8.8.8 --baggageclaim-bind-ip 0.0.0.0 --garden-bind-ip 0.0.0.0 --tsa-host conc-lb-us-east-1-1472385510.us-east-1.elb.amazonaws.com:2222 --work-dir /concourse-tmp
sudo docker run -d --network host --name concourse_web --restart=unless-stopped -v /etc/concourse/keys/:/concourse-keys concourse/concourse:3.14.1 web --peer-url http://172.0.0.20:8080 --postgres-data-source nope --external-url https://ci.7fdev.io --github-auth-client-id nope --github-auth-client-secret nope --github-auth-organization 7Factor

Error:

resource script '/opt/resource/check []' failed: exit status 1

stderr:
failed to ping registry: 2 error(s) occurred:

* ping https: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
* ping http: Get http://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

errored

Webs and workers live in a subnet with access to a NAT gateway as indicated in the picture below. Anyone have any ideas? I can CURL whatever I want from the host and inside the container–so currently I’m pretty baffled. Either I suck at understanding docker networking or I’m missing something small.


#2

In case anyone happens upon this the issue is fixed with the new 4.1.0 flag: --garden-dns-proxy-enable. With that everything wires up correctly–my AWS VPC networking setup was completely fine.