Centos8 workers keep stalling

we switched our worker nodes from ubuntu 20.04 to centos8 (we have three EC2 instances on AWS, they’re m5a.xlarge) and the workers keep stalling typically within an hour or two.

the documentation is, as expected, pretty hopeless in providing any kind of troubleshooting. I’m going to see if I can contribute to that.

as you can see, there’s no suggestions as to how to diagnose the cause of stalling or any idea for mitigation: https://concourse-ci.org/concourse-worker.html#worker-heartbeating-and-stalling

Is anyone else running centos8 as a worker node, and do you have stalling problems or do anything special to prevent it?

back story: we mainly use CentOS because our main development needs to be RHEL compatible. We had been using ubuntu20.04 for our worker nodes because the kernel supported runc out of the box, but since an official centos8 AMI for AWS came out recently, it seemed time to make the switch and thereby run everything on centos, which makes our Puppeting modules simpler.

the docs use booklit, which requires Go. and booklit needs yarn and it doesn’t explain the dependencies… there’s another rabbit hole I’m not going down, so sorry, I won’t be writing any docs for this.

we’ve pretty much decided to abandon the use of concourse altogether because when things go wrong there’s just not enough documentation on troubleshooting, and although the people are helpful here when they do help, there’s just not enough support.