Passing a resource through multiple jobs in parallel


#1

Hello,

I’ve come across an issue which is either a bug in concourse, or a misunderstanding on my side of how resource passing to jobs works. Essentially, I have a job pipeline structure like this:

resources:
- name: sync
  ...

jobs:
- name: job-01
  plan:
  - get: sync
  - task: ...(produce sync/file-01)...
  - put: sync

- name: job-11
  plan:
  - get: sync
    passed: [ job-01 ]
    trigger: true
  - task: ...(produce sync/file-11)...
  - put: sync

- name: job-12
  plan:
  - get: sync
    passed: [ job-01 ]
    trigger: true
  - task: ...(produce sync/file-12)...
  - put: sync

- name: job-21
  plan:
  - get: sync
    passed: [ job-11, job-11 ]
    trigger: true
  - task: ...(write sync/job-21)...
  - put: sync

In English, essentially I have a collection of jobs (job-01, job-11, job-12 and job-21) which all work on a resource named sync. Some of the jobs, namely job-11 and job-12, can/should be executed in parallel. The resource sync is an rsync directory (type “rsync” from mrsixw/concourse-rsync-resource). Each of the jobs produces a uniquely named file inside sync (file-01 through file-21). As a real world example, this pipeline serves as a compiling system for a piece of C++ software with many dependencies. The sync resource is the output folder, gradually getting filled with all its library dependencies.

What I would expect is that at the end of the pipeline, sync would contain 4 files: file-01, file-11, file-12 and file-21.

What is happening instead is that my final result only contains file-01 and file-21, completely bypassing the job-1x series of jobs. It turns out that this isn’t by some accident: concourse explicitly shows me that all the jobs 11, 12 and 21 all receive the same version of sync, namely the one produced by job-01. In other words, despite the passed: [ job-11, job-12 ]-requirement, resource sync passed to job-21 never actually passed through job-11 and job-12.

I’m using concourse vanilla version 4.2.2.

Does anyone have any ideas why? Is this a bug, or do I misunderstand something?

If this is intended behavior, can anyone help me understand, and possibly give a hint about how to implement the desired behavior?

Thanks & Cheers,
codedump.


#2

I don’t think using such resource in a concurrent way it’s a good idea. Basically, as far as I know :

  • Concourse allocate a resource container on a worker to check new version (say #0)
  • Then job-01 is started on a worker (may be the same worker or not)
  • job-01 get resource in desired version (say #0)
  • job-01 put a new version of the resource (say #1)
  • job-11 and job-12 get resource in latest version handled by job-01 (say #1)
  • Each one put a new version of the resource may be concurrently (what’s behavior of rsync when running concurrently ?) (say #1-A and #1-B)
  • job-21 get resource in the latest version handled by both job-11 and job-12 (say #1)
  • job-21 put a new version of the resource (say #1-C)

So behaviour seems ok for me.

Note: in your example job-21 is waiting for job-11 twice. Not sure if only a copy mistake …

Regarding your needs it seems what you want is:

  • job-01 consume resource r-A (source code ?)
  • job-01 produce resource r-AO (.o ?)
  • job-11 consume resource r-AO (and optionnally matching r-A)
  • job-11 produce resource r-BO (.o ?)
  • job-12 consume resource r-AO (and optionnally matching r-A)
  • job-12 produce resource r-CO (.o ?)
  • job-21 consume resource r-BO and r-CO (and optionnally matching r-A and r-AO)
  • job-21 produce resource r-D (.exe ?)

I suggest to use some binary repository with version support (e.g. Nexus, Artifactory, S3, …)


#3

Thanks for the detailed explanation. (Yes, the job-21 wait condition was a typo.)

I’m a little confused about the behavior why would be OK as you describe it in the first part of your post. I mean in particular the 2nd-to-last point:

  • job-21 get resource in the latest version handled by both job-11 and job-12 (say #1)

TBH I did figure out that #1 is somehow the branching point before the resource splits between job-11 and job-12, and that this has something to do with the observed behavior. But this is exactly the point: #1 was in fact not handled either by job-11 or job-12. It was only handled by job-01. So why was it passed over to job-21 in the first place? If anything, then I’d at least expect concourse to stop processing, not to just randomly pass over a resource that didn’t pass, but might probably pass through a specific job sometime in the future.

I do agree, however, that the passed: [ job-11, job-12 ] statement can probably not be fulfilled, since there is not a single version that passes both jobs. Hence I also agree that my use of the resources is probably flawed, conceptually. (That realization came slowly over the past days, but your post reassures me of this :slight_smile: ).

Regarding rsync: it would be a mess if race conditions like conflicting files would be written by the concurrent jobs. But it should work fine in my particular case since I’m only generating new files that don’t share common names.

… and I’m still mad at concourse for silently passing on the resource in the wrong version even if I explicitly requested otherwise. This is what I consider a bug. If not the code, then at least the documentation should be fixed to explain more clearly what “passed through” means in the context of a resource and a job. The way it stands, it seems terribly unchronologic, as in: “if at any point this resource has passed [in the past] or is going to pass [in the future] through the specified job”, then passed is considered fulfilled o_O… ?!


#4

I hope the following timeline will help you understand what happens (note that job-11/job-12 can be exchange in the example, it doesn’t matter).

| job | action   | handled (last come first)                                  |
| 01  | get v01  | 01: [v01]                                                  |
| 01  | put v02  | 01: [v02,v01]                                              |
| 11  | get v02  | 01: [v02,v01]  11: [v02]                                   |
| 12  | get v02  | 01: [v02,v01]  11: [v02]      12: [v02]                    |
| 11  | put v03  | 01: [v02,v01]  11: [v03,v02]  12: [v02]                    |
| 12  | put v04  | 01: [v02,v01]  11: [v03,v02]  12: [v04,v02]                |
| 21  | get v02  | 01: [v02,v01]  11: [v03,v02]  12: [v04,v02]  21: [v02]     |
| 21  | put v05  | 01: [v02,v01]  11: [v03,v02]  12: [v04,v02]  21: [v05,v02] |

#5

Thank you for the elaborate table. I think I understand what happens chronologically, that’s the reason I’m more and more confused: I don’t see how this fits together with what is supposed to happen, according to the documentation.

Let me try to explain what I think the documentation says. I’ll be using your table to illustrate.

I’m guessing that, when you write X: [vA,vB] in the 3rd column “handled (last come first)”, you mean something along the lines of:

  • “job X has seen versions A and B of the resource”

But you don’t mean this:

  • “job X has produced (a.k.a. introduced changes now available in) versions A, respectively B”

Is that right? (…you cannot possibly mean anything like that, because that would be false; job-01 never actually recorded any modifications as v01 – it was just an input.)

This last part is crucial to my misunderstanding, because the 3rd column (“handled”) of row 7 then read like this, in plain words:

  • job 01 handled the resource in versions v01 [r], then v02 [w]
  • job 11 handled the resource in versions v02 [r], then v03 [w]
  • job 12 handled the resource in versions v02 [r], then v04 [w]
  • job 21 handles the resource in versions v02 [r]

Here, [r] and [w] indicate if the job has [r]ead or [w]ritten that particular version of the resource.

Now comes the point where I get confused: my job-21 has a parameter passed: [ job-11, job-12 ]. What I’m hoping this means: "get the latest version that made it through (i.e. came out as output of) jobs 11 and 12, ". But, again, v02 never did get through job-11 and job-12. It was were merely input to those jobs. It therefore has no business being passed on to job-21 as “passed”. There is in fact no single version that made it through both job-11 and job-12.

So, to attempt some closure: is this the correct way to read a statement like:

    get X: { passed: [ Y ] } 

as: “the latest version of a resource X that had any contact, however frugal, and even possibly just read-only in the role of an input, with the specified job Y”?

Either this, or it’s a bug. Right?

Thanks & Cheers,
codedump


#6

I think the following two links can help:


#7

I’m guessing that, when you write X: [vA,vB] in the 3rd column “handled (last come first)”, you mean something along the lines of “job X has seen versions A and B of the resource”

Yes it is.

Is that right? (…you cannot possibly mean anything like that, because that would be false; job-01 never actually recorded any modifications as v01 – it was just an input.)

I’m a little bit confused about your wording (but english is not my mother tongue). Just to be sure, a version can’t be modified. Resource can be modified (and generate a new version). In fact resource implementation may provide two content for a single version but Concourse may provide one content or the other based on its caching strategy.
And conceptually its two version, resource is just lying.

Now comes the point where I get confused: my job-21 has a parameter passed: [ job-11, job-12 ] . What I’m hoping this means: "get the latest version that made it through (i.e. came out as output of) jobs 11 and 12, ". But, again, v02 never did get through job-11 and job-12. It was were merely input to those jobs. It therefore has no business being passed on to job-21 as “passed”. There is in fact no single version that made it through both job-11 and job-12.

Asking a resource that has passed one (or many) job(s) asked the latest common version. In your example, job 21 will never get v03 or v04 as they not passed both job 11 and 12.

To better understand “passed”, you can considerate two kinds of resource usage : validation (get) and generation (put). When you get source code and compile it, you’re just validating it compiles. Then, you put generated binaries into a repository.

as: “the latest version of a resource X that had any contact, however frugal, and even possibly just read-only in the role of an input, with the specified job Y”?

To be more exact here the full declaration:

- get: X
  resource: X
  version: latest
  passed: [ do-X, make-X ]

Now just read, get into X resource X in latest version that passed both do-X and make-X. As seen previously (see also below), passed mean read/write by. And version apply to common subset of versions of all passed jobs.



I think your problem is your usage of single “physical”/“impl”/“technical” resource for conceptually many resource.

I hope telling this new story will help you understand Concourse behaviour:

  • job “build-common” get “source code”
  • job “build-common” put “common”

  • job “build-frontend” get “source code” passed by “build-common”
  • job “build-frontend” get “common” passed by “build-common”
  • job “build-frontend” put “frontend”

  • job “build-backend” get “source code” passed by “build-common”
  • job “build-backend” get “common” passed by “build-common”
  • job “build-backend” put “backend”

  • job “test” get “source code” passed by “build-frontend” and “build-backend”
  • job “test” get “backend” passed by “build-backend”
  • job “test” get “frontend” passed by “build-frontend”
  • job “test” put “test-result”

As you can see it seems natural that “passed” cover both read and write, otherwise you won’t be able to connect job by their output or by their input (not sure which cases making you confused).

You’re also assuming that getting a resource is necesseraly done at job beginning and putting at job ending. So a complete passthrough doesn’t make sense.


#8

Thanks, the story actually did it for me, in particular this part:

  • job “build-frontend” get “source-code” passed by “build-common”

The source of my misunderstanding was that I viewed a resource as something like a data object following a stream – something that gets passed around between jobs. In my understanding, “versions” were just different states of that object (the “resource”) that chronologically follow each other. In particular, I had in my mind a version X before entering the job (get), and a later version Y after leaving it (put).

To me, “passed X” thus had to mean “did enter job X and came out on the other side”, pretty much like the graphical representation of a concourse pipeline suggests.

(To give some closure to the thread: I did already solve my problem the past days in two different ways. One is as you suggest, by connecting jobs using the source input; but I didn’t understand the concept of why that had to be done that way, and that bothered me. The other is by putting my build steps into a single job, with individual packages being tasks, using aggregate to clump together those that were independent on one another.)

Thanks for the patient explanation! :slight_smile:

Cheers,
codedump.


#9

One is as you suggest, by connecting jobs using the source input; but I didn’t understand the concept of why that had to be done that way, and that bothered me.

In general I connect by input to make sure I handle all resources linked to a same source. For example, be sure backend & frontend binaries comes from the same source code revision. But not sure how it can have help you.

The other is by putting my build steps into a single job , with individual packages being tasks , using aggregate to clump together those that were independent on one another.

It’s also solution I have in mind but you loose fan-in/fan-out facility such as being able to check produced resources individiually as they are coming and doing integration checks when both are. Here is a pattern that I often use:

                   +------+                 
                   |source|                 
                   +---+--+                 
                       |
            +----------+---------+            
            |                    |            
            |                    |            
        +---v---+            +---v---+        
        |Build A|            |Build B|        
        +---+---+            +---+---+        
            |                    | 
    +-------+-------+    +-------+-------+   
    |               |    |               |   
    |               |    |               |   
+---v--+            |    |            +--v---+
|Test A|            |    |            |Test B|
+------+          +-v----v-+          +------+
   |              |Test A+B|             |   
   |              +--------+             |   
   |                  |                  |   
   |                  |                  |   
   |                  |                  |   
   |                  |                  |   
   +------------------+------------------+   
                      |                     
                  +---v---+                 
                  |Release|                 
                  +-------+                 

Thanks for the patient explanation!

No problem, most time it just means we don’t understand each other. As I often “speak” English without being fluent, I always doubt that I have read and written correctly ^^


#10

It does help because my repo setup is similar to the one concourse also has, i.e. I have various smaller repositories with all the packages, and one large “Meta-Repo” which essentially just groups together all the package versions as submodules. Every commit in the “Meta-Repo” represents a collection of packages that are supposed to work well together. So I can easily connect by input to the Meta-Repo, it’s practically predestined to correlate packages together.

Yes, and you also lose nice visual feedback about what exactly is currently failing :slight_smile:

I guess that the final version will have to be some kind of intermix between jobs and tasks (e.g. using jobs for “large” groups of packages, like dependencies, libs, applications, tests; then using tasks within those jobs for the individual packages).

Ultimately, this is all supposed to go towards a more automated deployment system. Compiling stuff is just the first step there. Piece by piece, as I’m slowly getting there, I’ll probably be more interested in a different granularity regarding visual feedback, so I’ll probably be changing between jobs and tasks quite a bit for a while.

But for that, I need(ed) to understand the concepts properly first. When standard problems end up needing non-standard solutions, it’s usually an indicator that I didn’t quite understand the concepts behind the tool… This the reason why I didn’t go with existing solutions like https://github.com/concourse/concourse/issues/2324 or the gate resource (@marco-m: I had studied the posts you linked for the most part of last week, even before I asked for help; thanks for sharing, though, I really do appreciate the help!)

Cheers,
codedump.