DNS resolution failures on 5.4.1

Hey guys,

Thought I’d post here in case anyone has seen similar behaviour happening to them.

Since upgrading to concourse version 5.4.1. we have been seeing regular dns lookup timeouts appearing on our build stages trying to reach our dns server from our web nodes. These errors seem to be intermittent and have only began to occur since upgrading from 5.2.0 to 5.4.1

The stages error with an internal error like:

dial tcp: lookup concoursedb.xx.xxxx.com on 10.216.0.2:53: read udp 10.216.0.59:45368->10.216.0.2:53: i/o timeout

our /etc/resolv.conf contains the dns server address: nameserver 10.216.0.2
and we are running the standard concourse binary 5.4.1 on Fedora28 aws ec2.

Running dns resolution tests from the server itself to the our db all run as expected so the problem appears to lie within the internal concourse dns resolutions.

Anyone seen similar?
Thanks in advance,

Ash

1 Like

Is it possible there is resource contention causing slow downs enough to bottleneck DNS lookups? I’ve seen this locally when overloading my machine.

Where is your concourse worker & DNS server?
If they are on the same machine, you may need to allow guardian/garden to access your host’s network.
Otherwise, you need to point guardian/garden at the external server’s IP.
See Troubleshooting and fixing DNS resolution for more information