How to use common base image when utilizing `registry-image-resource`


#1

When using the new registry-image-resource (repo) is there a way to pre-fetch a base image that is used by multiple jobs? I want to prevent that each job pulls the same image over and over again.

There is a very good answer at StackOverflow that explains how to do that in two different ways for the docker-image-resource. But registry-image-resource has no support for neither load_base nor cache & cache_tag. (Please correct me if this is wrong!)


#2

One possibility might be to create a local (to your network) repository and push/pull from that. After that use an “image_resource” for the normal jobs and pull from the local repository.

IIRC there is a parameter/envar which sets the default repository address(es) … but I always used an image_resource for local repo’s.


#3

By “repository” you mean a version control repository? Like Git? How would I push/pull a (Docker) image to/from there? Sorry if these are dumb questions, but CI/CD is very new to me and I’m still learning.

Maybe I should explain better what I have in mind:

resource_types:
# Github Pull Request
- name: gh-pull-request
  type: docker-image
  source:
    repository: teliaoss/github-pr-resource

resources:
# Github Pull Request
- name: pull-request
  type: gh-pull-request
  webhook_token: ((github-webhook-token))
  source:
    repository: ((github-repo))
    access_token: ((github-access-token))

jobs:
- name: perform-checks

  on_failure:
    put: pull-request
    params:
      path: pull-request
      status: failure

  on_success:
    put: pull-request
    params:
      path: pull-request
      status: success

  plan:
  # Fetch files from GitHub PR
  - get: pull-request
    trigger: true
    version: every

  # Set PR status to "Pending"
  - put: pull-request
    params:
      path: pull-request
      status: pending

  - task: task-a
    config:
      platform: linux
      image_resource:
        type: docker-image
        source:
          repository: ((test-base-image))
          username: ((docker-hub-username))
          password: ((docker-hub-password))
      inputs:
        - name: pull-request
      run:
        path: test.sh
        args: ["project-a"]
    vars:
      docker-hub-username: ((docker-hub-username))
      docker-hub-password: ((docker-hub-password))

Now I would like to add a second task, task-b, which is identical to task-a in every way except the arguments for the run parameter. What I noticed is that each job will pull the same image which takes quite an amount of time and I thought that this could be sped up by fetching and caching that common image somehow.


#4

Note that the download of the image happens only the first time that the task runs on a given worker. If the same task runs on the same worker for the second time, the image will be cached.


#5

Yes, I have noticed that. But I have a very simple setup and (currently) only have one worker. That worker fetches the common image for each job over and over again (on the first run). Second run uses the cache. But I thought maybe there was room for improvement.


#6

I am confused here. If

  • you have a single worker
  • and that worker stays connected to the ATC
  • and you have 2 tasks that refer to the same image,

then that image will be downloaded only once and then cached.

If you see multiple downloads, then I have only 2 explanations possible:

  • the image actually changed (your configuration pulls the latest tag). You can verify this in the output, at a certain point it mentions the image hash.
  • enough time passed that the image cache is evicted.

EDIT: example of image hash in the UI output:

Digest: sha256:4341a58b43d18a0159068b5b88f451ed6de19bc62b9251372fce7a48aeb64342
Status: Downloaded newer image for docker.internal:5000/pushprox-builder@sha256:4341a58b43d18a0159068b5b88f451ed6de19bc62b9251372fce7a48aeb64342

#7

I will try to confirm the described behaviour. Maybe it was indeed some mistake I have made during my first steps.


#8

Hi,

No room for improvement :sunglasses:
If the workers are caching the image, which they should, then its good.

You are pulling images from Docker Hub (I presume, its the default IIRC). Where I worked we pulled/pushed images from a local repository (behind a firewall) called a Docker Trusted Repository, and there are a few alternatives. So it was faster since it was on the same Network … but … large images still took a little time to download - solution is to use smaller images! Anyway, that’s another topic.

Its also possible to persist/save a docker image to a file, put it on a drive, and then reload it later. I only did that to work around some problem, the cause of which I no longer recall - but if you had a really big image and its taking minutes to download, that could be a “hack” to try. Personally I would rather install a local Docker Repository (my NAS has one built in actually, so its not that hard to do).


#9

I’m using a private registry.

Can you please explain (maybe with a code example) how to persist a docker image to a file and reload it later?


#10

OK, this is what I did, but it was a work around for something much more complex (Docker in Docker related problem IIRC). SO normally don’t do this … lots of pain.

Better to just install a local Docker Repository, push and pull from that, and then if you want to make an image publicly available, setup up a simple Pipeline to pull_local and push_public the Docker Image.

Anyway, simple Linux commands do the trick.

  - name: foo
    plan:
      - aggregate:
        - get: pypi_foo
          trigger: true
        - get: git_foo
          trigger: true

      - task: Foo Build Test
        privileged: true
        params:
          IMAGE_NAME: foo
        config:
          platform: linux
          image_resource:
            type: docker-image
            source:
              repository: local_docker_repo:443/foo/foo_alpine_dind
          inputs:
            - name: git_foo
          outputs:
            - name: docker-images
          run:
            path: /bin/bash
            args:
              - -exc
              - |
                cd git_foo
                make build
                make start
                sleep 10
                make list
                make test
                docker save foo:latest > ../docker-images/foo.latest.tar

      - put: local_docker-foo
        params:
          load_repository: foo
          load_file: docker-images/foo.latest.tar
          tag_as_latest: true
        get_params:
          skip_download: true