Switching to Concourse from Jenkins


#1

We’re running an open source build system (caffe 2 and pytorch) and looking at switching from Jenkins to another CI, as we’re having too many issues with Jenkins.

Each GitHub PR triggers a large number of builds, e.g. one for each python version, one for each each cuda version, one for each nvidia driver version, one for Linux, MacOS & Windows, etc. So the total number of combinations is huge. On top of that we run performance regression tests, and each time a PR is merged, we trigger new builds for all pending PRs.

We have configured EC2 auto-scaling groups for the different workers types (GPU & CPU), so we can handle peak load builds in a reasonable time.

This functionality we’d like to retain, but not have to deal with all Jenkins issues :wink: Does this sound like a good fit for Concourse?

And how does Concourse deal with MacOS & Windows builds?


#2

You have a complicated setup to migrate :slight_smile:. I cannot give a full answer but I can point you to some aspects to consider:

  • First of all, do not try to migrate that setup directly to Concourse. The learning curve will be too steep and you will risk to abandon before it works. Start small, no matrix build, and get a feeling of how it works and if you like how Concourse works.
  • First big choice will be: do I handle everything myself (what is called Concourse binary) or do I use BOSH to deploy and orchestrate ? You will see BOSH mentioned in the downloads page. We went with the “we handle everything ourselves” on AWS with autoscaling, Terraform, Packer and SaltStack, because when we started, we had a look at BOSH and it seemed it would take too much time to learn it. Now we have everything working, but the cost has been a lot of time to make it work (not the fault of Concourse, more the complexity of AWS; somehow we redid our own BOSH). So if I could go back in time, either I would pick BOSH directly or at least I would dedicate more time at really evaluating BOSH before taking a decision.
  • Different worker types (GPU and CPU) can be handled with tags (equivalent to Jenkins labels), see https://concourse-ci.org/tags-step-modifier.html EDIT See also https://github.com/concourse/concourse/issues/2353 “Using Concourse with GPU enabled containers”.
  • The type of builds you are doing are a classic matrix build (if I understand correctly). Concourse doesn’t have direct support for that, you have to implement it yourself indirectly with pipelines. There is an upcoming feature called “Spaces” that will give also matrix build, but is not available yet and my personal impression is that I am not sure the added complexity in pipeline syntax and conceptual load is worth it, compared to just having multiple pipelines. I might be wrong on this assessment. The way to emulate matrix builds with simple pipelines is to embrace the ((param)) parameter substitution facility and understand how to use it with fly for configuration and with a secrets manager like Vault or AWS SSM for secrets.
  • in Concourse, everything is done with “resources”. Concourse ships with some built-in resources (like git or S3), and there are a lot of community-maintained additional resources, such as the one to trigger builds on github PRs, you would have to validate if it does what you want. Consider also the security implications of triggering foreign code on a PR. Foreign because the pipeline and build scripts can be modified at will in the PR. Pipeline code can also be modified to reveal secrets such as SSH keys, so be sure to consider this, for example if the repository is public (your case) then do not use SSH key (not even read only), just use the HTTP URL for the repo. There are also ways to filter on the path, so for example if your CI code is below directory ci/, not triggering PR builds if they match that path might be good enough (never tried).
  • And now to platform supports. The Concourse story is stellar for Linux, since each single task is done in Docker containers: zero snowflakes, total reproducibility. On a Linux worker, the only software you have to add is the concourse binary itself! All the builds dependencies come in Docker images. On the other hand, the story becomes “old style” on Mac and Windows worker: you have to install all the build dependencies on the workers themselves, there is no isolation and so snowflakes are possible. We mitigate this by having all our Windows workers in time-based ASG, so that each 24 hours we clean all the status by simply using new workers. Mac is like Windows, but since AWS doesn’t support Mac, you will have to manage the Mac workers either yourself or in another cloud provider. Concourse supports this setup directly, there is a way to choose the direction of the connection so that you can pass through NATs.

I have Concourse in production and, although it has its wrinkles, I am super statisfied by it and I would recommend it to anybody willing to spend the time to learn properly a new tool. So by all means go ahead! Just start small. The best documentation is:

  1. https://concoursetutorial.com/
  2. the https://concourse-ci.org site itself

If you want to get a feeling of how it works without the need to setup an S3 bucket or a secrets manager like Vault, you might want to checkout my “Concourse in a box” VM, that contains an S3 compatible blob store and Vault https://github.com/marco-m/concourse-ci-formula. Otherwise, there are Docker-based solutions on the download page, but you have to provide S3 and Vault yourself. Note that S3 is not optional: you need S3 or an equivalent out-of-band storage to pass build artifacts around from one job to another.

Good luck!
marco


#3

Thanks for the very detailed answer Marco!

I’ll get started on a small setup using vault (as we already use that) and will see where it takes us…

And using bosh is a no-brainer, as I’m one of the original bosh engineers :grin:

/M


#4

Hi Martin,

Could you please share your experience with Concourse, as we also in a similar situation and planning to move our complex landscape from Jenkins to Concourse. Your input on this matter would be very very helpful, thanks in advance.


#5

Hi!

Unfortunately I’ve not had time to do anything more than play with it :frowning: