Is concourse's aim to eliminate snowflaking just a myth?


#1

Concourse sells the idea of eliminating snowflakes by not allowing you to change build parameters from the UI and requiring it to come from a config file.

Theoretically one can then store that config file in source control to keep it ‘up to date’.

However, unlike travis, circle, and many other CI systems, concourse does not actually use the config file from source control. Instead, it uses the config as ‘in memory’ if you will, where I must take it from on disk and explicitly send the config to the cluster.

This means I could be working off a file on my disk that is not in source control, or someone could come along get the pipeline config, change it, and apply it without ever touching source control. I consider this ‘snowflaking’ as much as UI changes.

It also means that I can’t point concourse at a repo and say ‘build using that repo’s pipeline file’, so the correlation between the source control version and the version of the pipeline that can build it is missing.

The ‘concourse pipeline resource’ feels very much like a hack on, and there are strange issues that crop up when for example trying to automatically update a pipeline for every commit.

Will concourse ever solve this holistically? It seems like a big gap, and with changes coming like more configuration in the UI, I’m worried.


#2

I’d be really interested to hear if anyone is successfully using concourse like this. Manually setting pipelines is a constant source of confusion for our users, for example:

  • updating a pipeline before pushing code it relies on
  • pushing code but forgetting to update the pipeline
  • not committing changes to the pipeline config
  • updating a pipeline does not trigger a new build
  • updating a pipeline with a new job will trigger a new build with an old commit

Because the repo and the pipeline are not versioned as one it is tricky to update a pipeline without breaking it at least temporarily while things become aligned.

We have been experimenting with an initial job that updates the pipeline by reading the pipeline config from the commit, but it is not going well. Has anyone got experience with this pattern?


#3

I experimented with a pattern to solve this actually.

Out concourse was configured with a separate team for these kinds of pipelines. There was an initial pipeline that was set to the team which was a “generator” pipeline. It’s job was to check for “new things”. When it found a new thing, it would fetch it, and instantiate a pipeline found in this “thing”. Basically I was dynamically generating pipelines from the “thing” and each of those pipelines knew how to ask to be destroyed. So the pipeline would instantiate, do it’s job, post it’s results to a “place” and then destroy itself.

This is obviously not idle for keeping build history around as the pipelines would usually only be around for 1-5 builds and then destroy themselves.

To make the above more clear, a little context is that I was trying to solve the problem of parallel pull-request builds and developer visibility. Currently you could have one pipeline that handles all your pull-requests to a repository, but then each build could be linked to a different incoming branch, and it’s very difficult to tell a develop, “go look through the build tabs, find the one that’s yours”. At the time I was trying to solve this, it really wasn’t clear and there was no good solution. So I created a “pull-request” team in our concourse and set up a generator pipeline, with a custom resource I created that watched for incoming PR’s to specific branches. This resource would:

  1. Notice a new PR
  2. Fetch it’s source branch
  3. Instantiate a pipeline from the one on that branch. This allowed developers to dynamically update the way their code in that repository was evaluated. If they added a new service to our mono repo, the PR pipeline on their branch would be updated and know how to “do things” to this new element.
  4. The pipeline was instantiated with a git-resource that was anchored on the source branch of the PR (through some clever pipeline parameters). This meant that when that pipeline came online it would run against the correct stuff.
  5. This pipeline existed for as long as the PR existed. Once the PR was closed, our version control system (bitbucket) would send a webhook out that would trigger the destroy job in that pipeline, which would clean up any artifacts created by the pipeline and then remove the pipeline.

#4

Yeah, this is a bit of a gap right now. The current flow is easier to bootstrap and use for introductory material, but I would still like to improve this and deprecate the concourse pipelines resource in favor of first-class support for auto-configuring pipelines.

We opened an issue a long while back to collect feedback and ideas for a solution: https://github.com/concourse/concourse/issues/1200

Travis and Circle work by being pointed at a repo. We could support this in Concourse by pointing the pipeline at a resource. Something like this:

fly set-pipeline \
  --resource-type git \
  --resource-source '{"uri":"https://github.com/concourse/pipelines"}' \
  --resource-path ci.concourse-ci.org/concourse.yml

We would probably never remove the current behavior, because it’s still crucial for experimentation and introductory material, but we could strongly encourage this new method of configuring pipelines for real world use cases.

Note that this only makes it easier to “do the right thing”, it doesn’t make it impossible to snowflake: there’s nothing preventing someone from just running set-pipeline again and pointing it at some other repo, or using the current form of set-pipeline. That could be solved with RBAC though: https://github.com/concourse/concourse/issues/1317


#5

Edit: @vito has a great response above

The original spirit of “eliminating snowflakes” was specifically around the idea of pushing pipeline configuration into code, and not leaving it hidden in the UI. @eedwards-sk I think you did a really good job articulating a new snowflake in the workflow people are getting accustomed to in CI/CD systems.

I think there’s a world where we can improve Concourose to help with that flow. Maybe its a configuration parameter in fly that we send along with set-pipeline to keep monitoring this pipeline file, along with some UI on the frontend to indicate that its actively updating?


#6

Thanks for your responses! I’m very passionate about concourse so I’m glad y’all seem to perceive the same gap I do.

With the work on spaces and resources 2.0, it seems like pipelines becoming a first-class citizen would fit well into the upcoming ecosystem.

Right now concourse does make pipeline development very easy…

If I store the pipeline along with its source code, and then run a pipeline based on that commit (using that version of the pipeline), and then the pipeline fails halfway through, it’s often advantageous to re-run from where the pipeline failed if the only change needed was to the pipeline config.

(You can imagine scenarios where a build and test cycle takes 30 minutes but the packaging process failed due to a pipeline config issue, so you want to re-run the packaging step by continuing from where you left off after the build/test)

However, at that point, it means the earlier part of the build was done with a different pipeline config commit than the later part, so we can no longer say that the resulting artifact at the end was generated entirely with a specific pipeline config (since it changed part of the way through).

So this is great for pipeline development, but not necessarily for master builds where you want to ensure that an artifact can be built by a specific pipeline commit.

Ways to solve it? Well, circle / travis and others seem to force you to start over from the beginning, since every build comes from the pipeline file in that commit.

For concourse, to me I see two sort of pathways, like what vito was saying about experimentation:

  • one case we want to be able to quickly iterate on the pipeline config and re-run steps of the pipeline from later stages
  • another case where we want to ensure that the source commit for the pipeline config is the same version that builds it

However, with the latter, how do we ensure that the pipeline doesn’t start building a new artifact before it updates itself from the new version of the pipeline config?

For example, given:

  • I have a repo
  • it contains a pipeline and app source code
  • the pipeline has its own repo as a resource, and triggers builds off it

With today’s behavior, if I commit and push that pipeline file change, then the old version of the pipeline will attempt to build the new commit, unless I remember to ‘set-pipeline’ before doing it.

However, if I ‘set-pipeline’ and then DON’T push the updated commit, there’s a chance that pipeline will run again (off a different trigger source or manually triggered) and build the old source using the new pipeline config.

So I wonder if the behavior we could use is the ability to say “this pipeline comes from this source control resource, and do NOT build any new commits of this source control resource inside the pipeline, until you’ve first checked the new commit for a new version of the pipeline, and then applied any pipeline changes”…

Does that make sense? I feel like this is obvious to anyone who has used concourse, but I’m struggling to describe it succinctly.