All attempts to fetch resources fail

I’ve run the worker both in a Docker container and also not in a container and no matter what I get the following error:

resource script '/opt/resource/check []' failed: exit status 128

stderr:
Cloning into '/tmp/git-resource-repo-cache'...
fatal: unable to access 'https://github.com/starkandwayne/concourse-tutorial.git/': Failed to connect to github.com port 443: Operation timed out

This is pretty odd because that call works on the machine no problem. Settings are:

CONCOURSE_WORKER_WORK_DIR="/data/concourse/work"
CONCOURSE_WORK_DIR="/data/concourse/work"
CONCOURSE_TSA_HOST="localhost:2222"
CONCOURSE_TSA_PUBLIC_KEY="/data/concourse/keys/tsa_host_key_pub"
CONCOURSE_TSA_WORKER_PRIVATE_KEY="/data/concourse/keys/worker_key_1"
CONCOURSE_GARDEN_DNS_PROXY_ENABLE=true
CONCOURSE_WORKER_GARDEN_DNS_PROXY_ENABLE=true
CONCOURSE_GARDEN_BIND_IP=0.0.0.0
CONCOURSE_GARDEN_DESTROY_CONTAINERS_ON_STARTUP=
CONCOURSE_GARDEN_LOG_LEVEL=error
CONCOURSE_BAGGAGECLAIM_LOG_LEVEL=error
CONCOURSE_BAGGAGECLAIM_BIND_IP=0.0.0.0
CONCOURSE_BAGGAGECLAIM_DRIVER=btrfs

The systemd unit file to run this outside of Docker looks like this:

[Unit]
Description=Concourse Worker
After=network.target

[Service]
EnvironmentFile=/data/concourse/worker.env
ExecStart=/data/concourse/bin/concourse worker
KillMode=process
LimitNPROC=infinity
LimitNOFILE=infinity
TasksMax=infinity
Restart=on-failure
RestartSec=3
ExecStop=/bin/kill -USR2 $MAINPID ; /usr/bin/tail --pid $MAINPID -f /dev/null
TimeoutStopSec=300
Delegate=yes

[Install]
WantedBy=multi-user.target

I ran tcpdump on the one work node and I never see any attempt to contact github.com on 443, so I am guessing this is something wrong with Garden or something else in the system.

Any help is much appreciated, I’ve spent a few hours banging my head on this.

Attempting to check the resource from fly looks like:

$ fly cr -r test/resource-tutorial
error: Unexpected Response
Status: 504 Gateway Time-out
Body:
<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>

fly and concourse both version 5.3.0

After a TON of head banging and debugging, this is a problem with Garden containers not getting any traffic forwarded. I am not sure of the reason, but may have something to do with Docker and Garden running on the same machine? I’m still debugging how to properly make this work, but this makes it work (not production ready!):

sudo iptables -I FORWARD 1 -s 0.0.0.0/0 -d 0.0.0.0/0 -j ACCEPT

Hi @relistan.

What deployment method are you using? Are you using Bosh? If so, can you post the part of the manifest that contains the garden properties (REDACT ALL SECRETS)? Also what IaaS?

It does seem like garden’s network settings are misconfigured. For reference, many CloudFoundry/Pivotal teams use this repo to manage how Concourse is configured.

Thanks for the response! We’re deploying it with Ansible, following the settings in the documentation, plus the example systemd unit files. This is AWS on Ubuntu 16.04 instances.

I have already been through the BOSH deployment and the release. I don’t see anything obvious that I’m missing in there. except perhaps this setting:

CONCOURSE_GARDEN_ALLOW_HOST_ACCESS

Going to give that a shot. I don’t see anything in there about configuring iptables, so it must be some other setting, I guess.

Hmm, no CONCOURSE_GARDEN_ALLOW_HOST_ACCESS seems to have no effect on this issue.

I’m not familiar with the ansible deployment method. It does seem like there is an issue with containers that Garden is creating. As you mentioned the containers aren’t able to access the internet. I think the network traffic should go as follows:

  1. connection initiated from within the container
  2. packets the virtual ethernet interface on the container
  3. the virtual ethernet is bridged to the ethernet interface on the host
  4. packets sent out of the host’s gateway

Looks like we’re not getting to step (3) based on the fact that tcpdump doesn’t show any traffic on the host. You can run fly intercept to run a bash session in the check container. If you run ifconfig and ip route from within the container and paste the command output here that might help us get a root cause. If you run ifconfig on the host you should see the virt eth ID on the host (drop the “-”) part. This will make sure that networking between the container and host is setup correctly.

Hi Kevin, It really looks like an issue where Docker’s IPtables rules and Garden’s don’t work well together. Both are installed on the host. The packets from the Garden containers are not getting forwarded to the host interface from the virtual interface. (i.e. between 3 and 4 in your list). That rule that I inserted seems to fix it, but obviously is allowing forwarding where we don’t want to. I am not able to find recommendations about running Docker and Garden on the same host. If you know any resources I should look at that I missed, that might help!

Good find with the docker/garden conflict! Is it possible for you to run Concourse on a dedicated instance(s) so that Garden and Docker don’t have that conflict. I believe Concourse used to depend on Docker but not anymore.