Optimizing Concourse resource streaming in hybrid cloud (cloud and on-premises)



We have a Concourse deployment in AWS with Linux and Windows workers. We are now adding Mac workers on-premises since AWS doesn’t offer Mac virtualization.

And here comes the problem. Since Concourse handles resources only on workers with support for containers (that is, Linux), each time a Mac worker refers to a git repo, the repo will be cloned by a git resource in a Linux worker in AWS, streamed to the ATC in AWS and streamed to the Mac worker on premises. Since the repos we are using are big, we expect an impact on the bandwidth and on the reverse proxy (we are still working on this so we have not yet deployed). Similar for often used resources like s3-resource.

So the questions: what are our options ? Would it help to have a on-premises ATC peering with the cloud ATC, and some Linux workers ? Anything simpler ?



Have you considered using the tag feature on resources to specify a tagged worker to perform the resource check on (ref: https://concourse-ci.org/resources.html#resource-tags)?


My understanding is that tagging a worker would still create the same flow: from tagged worker (either in the cloud or on premises) to the ATC in the cloud to the Mac worker on premises. Or am I missing something ?


Actually, after talking to @vito I realized I was wrong and you can’t actually tag Darwin workers anyways because they don’t support containerization.

If I understand the mechanics of Concourse correctly, the data from a resource version doesn’t actually flow through the ATC itself. In the situation of a tagged worker, the ATC schedules the resource check container on the tagged worker and the worker with the check container is responsible for streaming the volumes in.