Intermittent "connection refused" errors when talking to web


#1

Every so often tasks fail in our CI and have these error messages:

dial tcp 10.12.2.37:34301: connect: connection refused
Put /volumes/879541e5-f5ad-421e-5be7-ef55508a74b3/stream-out?path=.: dial tcp 10.12.2.37:42481: connect: connection refused
dial tcp 10.12.2.37:40663: connect: connection refused
dial tcp 10.12.2.37:44287: connect: connection refused
dial tcp 10.12.2.37:39389: connect: connection refused
dial tcp 10.12.2.37:43941: connect: connection refused

The 10.12.2.37 IP is the cluster IP of our web Pod. We are using the helm chart on GKE:

Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.5-gke.2", GitCommit:"f199298d18103a59db32d97a92072fbb17b1175a", GitTreeState:"clean", BuildDate:"2018-07-10T02:37:32Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Running concourse v3.14.1.

The worker Pods periodically crash, the web node has not crashed. Not sure what to look for in the logs to find out why this is happening. I’m also not sure why the workers would be initiating connections to random ports on the web Pod.

Any help on this would be appreciated!


#2

We gave up and reverted to using a bosh deployed concourse.