Put /volumes/**/privileged: net/http: timeout awaiting response headers

#1

Environment: Concourse 4.2.3 running within Kubernetes 1.13.2

Receiving this error when using a custom image from ECR:

Put /volumes/**/privileged: net/http: timeout awaiting response headers

The container is 3.49GB - it’s very slow pulling the image (28 minutes) and eventually, it’ll just hang - if the build was allowed to keep going I don’t think it would time out.

I do have another ECR custom image working most of the time, though that will sometimes return:

resource script '/opt/resource/check []' failed: exit status 1

stderr:
failed to get ECR credentials: credentials not found in native keychain

I’ve taken a look at the logs in web and workers, nothing shouts out to me. I’ve watched the worker doing the docker pull and then hang on the final layer with docker untar - killing those processes just starts the retries and sometimes it’ll say the second retry was successfully pulled then it just hangs some more. Tried to figure out why it’s hanging after the pull says successful, but I haven’t correlated a process on the worker with the actual job itself.

I don’t see resource exhaustion happening, plenty of disks space, memory, and CPU -it’s happened on multiple workers, with the same results.

Any help and pointers in what to troubleshoot or look for, please let me know your recommendations.

#2

I reduced the image down to 1.99GBs and it ran - so there seems to be a limit on how large a custom image can be with Concourse.

Possibly Concourse workers need to be specific size or have specific amount of resources to make larger images possible?