Configuration Recommendations

Our company started a new project about six months ago. At that time we had some goals, including setting up CI. I was tasked with this, and landed on Concourse (no pun intended) because it was dead simple to grasp and get running with, and fit with all the other tech in our stack. Also, I hate Jenkins.

Now we’re starting to see the CI used quite a bit, and unfortunately I’m finding that the setup that initially worked great for us - AWS t3.small, deploying web & worker to the same node via docker-compose - is not scaling at all.

Specifically, we’ve introduced some backend components that are written in Rust, and compilation is really eating up resources on the server. We can’t really run more than one job at a time without things crashing and getting hosed up. So… it’s time to upgrade I think, and get things set up correctly.

Except, I’m not really sure what “correctly” is. I’ve looked around the forum and online, and I’m aware that we could set up a cluster, which ultimately I think we’ll do, but what’s the right number of nodes, and how many should be web, how many worker, and what type of nodes (t3.small ok, or need something with more cores/memory)? Those are the questions I’m kind of stuck on.

Anyone care to give suggestions / recommendations based on your experience?

1 Like

At a high level, I have production concourse running in AWS using:

ALBs / NLBs in front of web nodes

ACM for SSL

Web Nodes in ASG

Worker Nodes in ASG

Lambda for Worker Draining

I run them on Ubuntu 18.04 with custom config scripts and systemd unit files.

How do you do worker draining with Lambda?

Here you go: https://gist.github.com/eedwards-sk/c91dc8e5d88b34d0a8eb1e77cbe867ac

I don’t think I’ve updated it since I made the gist, but buyer beware :slight_smile:

Also here’s an example of how I set up the load balancing (at a high level): Tip: Using Concourse behind AWS ALB and NLB with SSL

We run our cluster of about 42 total on DigitalOcean Droplets. We have two web nodes, balanced by an HAProxy node, a primary database that is consumed by a hot standby server, the rest are all worker nodes. Its all managed via Terraform & Chef. We are doing about 340k builds per month.

Awesome :slight_smile: Thanks

Do you know how the droplets are configured?

I do! For the droplet resource sizes, high level, our LB and web nodes are 8gb ram droplets, the DB and the workers are 20cpu / 64gb ram droplets. Workers have 600gb of disk space on the droplets, they generally hover between 30-60% used.