Concourse without dockerhub

Hi concourse community,

We’re currently starting to use concourse for deploying our pipelines for a company in the banking sector. As you can imagine, security in the banking sector is quite strict and we are not allowed to directly access the dockerhub images in absolutely no way.

Our setup: We use kubernetes (EKS) on AWS and use the helm chart to install concourse. All necessary (resource) images are mirrored locally to ECR. The concourse worker pods have an AWS worker role allowing access to ECR (roles distributed through kube2iam)

Our idea to run pipelines on concourse without access to dockerhub was to mirror all resource types docker images, and define them locally in the concourse pipeline yaml files. Below is an example of a pipeline file:

resource_types:
  - name: docker-image-resource
    type: docker-image
    privileged: true
    source:
      repository: some_account_id.dkr.ecr.some_region.amazonaws.com/docker-image-resource
      tag: 1.3.1
      aws_access_key_id: ((build-role-credentials.AccessKeyId))
      aws_secret_access_key: ((build-role-credentials.SecretAccessKey))
      aws_session_token: ((build-role-credentials.SessionToken))

jobs:
  - name: job
    plan:
      - task: simple-task
        config:
          platform: linux
          image_resource:
            type: docker-image-resource
            source:
              repository: some_account_id.dkr.ecr.some_region.amazonaws.com/cicd-image
              tag: 0.0.1
              aws_access_key_id: ((build-role-credentials.AccessKeyId))
              aws_secret_access_key: ((build-role-credentials.SecretAccessKey))
              aws_session_token: ((build-role-credentials.SessionToken))
          run:
            path: echo
            args: ["Hello, world!"]

What we see is the following error in the simple-task logs:

resource script '/opt/resource/check []' failed: exit status 1

stderr:
failed to get ECR credentials: credentials not found in native keychain

I find it very confusing that the resource type definition depends on a resource_type that requires probably a dockerhub image to start from. It feels there is a chicken or egg problem for this particular definition and I am wondering how this would work in practice. Concourse unfortunately offers very limited debugging possibilities (to my knowledge) to find out what the issue might be. Are in concourse the default resources part of the concourse worker images, or are they pulled from dockerhub?

Can someone with some deep knowledge about how this works help me out on how to set up a concourse environment without dockerhub access or any internet access whatsoever? That would be greatly appreciated. We are struggling with this some time and if we can’t find a solution to this problem we will probably have to migrate to another build tool as Jenkins (God forbid!) Maybe we are doing something awfully wrong here, but we fail to pinpoint where this might be.

kind regards,
-Kristof

The default set of core images, including docker-image and registry-image, are distributed with the workers. We’ll be changing this soon; we want to strip down the distribution size by taking most of them out, but we’ll always at least have a resource type that’s able to fetch images, otherwise it’s a bit of a chicken-egg problem as you said.

I’m not sure what’s causing the error you’re seeing, since I don’t use the ECR auth functionality, but I have a guess: make sure your worker nodes are configured with access to the ECR credentials and not the web nodes; all resource actions run on the workers.

Hi Vito,
Thanks for your reply. We’ve been able to track down the issue we’ve been facing: The docker-image resource uses the AWS ECR credential helper which uses AWS call’s there are no VPC endpoints for. As we are obliged to use these VPC endpoints in our environment and have no internet access whatsoever (even for internal AWS call’s) this means there still might be an issue. Kubernetes 1.13 (EKS) now supports ECR VPC endpoints which was recently supported. I expect others to start depending more and more on this functionality and as long as the AWS ECR credential provider keeps making call’s there is no endpoint for then this issue might arise. If there is no way to update the default docker-image resource provided with concourse, you might have made concourse virtually unusable in environments that have limited internet access using EKS/ECR, as they will depend heavily on this functionality using resources in ECR. Keep in mind that with your proposed change it will not be that straightforward anymore to use concourse in an environment with no or limited internet access.
Our workaround to this issue was to NOT provide the AWS credentials, but use the registry-image resource and provide it the docker credentials. Therefor be wary of using the ECR credential helper in an environment with ECR VPC endpoints.

Hi @martensk,

Not sure about your setup, but it works for us. I even managed to get it working via squid proxy.

We got it fixed in the end by “asking” broader access to AWS API’s through the company firewall. Problem is related to the fact that these resource images use the aws ecr docker credential helper library. That library makes an ECR call that does not go through our ECR VPC endpoints and therefor the company firewall blocked it. Works like a charm now.